[SOLVED] Crash after a couple mins of copy: Fatal Trap 9: General Protection Fault in kernal mode

chadilac

Dabbler
Joined
Dec 27, 2022
Messages
24
This is a new build, just trying to load it with data for the first time and keep having issues during copy or import.

Initially, I was trying to do a Disk Import via an attached USB SSD enclosure. It would get about 30% then crash. This happened about 4 times and I gave up. I assumed it was the USB enclosure, etc and after researching, most suggested just to copy over the network.

So then tried using Robocopy to copy from my old NAS to this new True NAS build. Again, it would copy for 3 - 5 minute, then crash.

From what I researched, I found the data/crash directory and extracted this from the textdump.tar.0.gz (the latest):
Code:
Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer     = 0x20:0xffffffff80af4b3e
stack pointer           = 0x28:0xfffffe01415f69c0
frame pointer           = 0x28:0xfffffe01415f6a40
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (z_wr_iss_h_2)
trap number             = 9
panic: general protection fault
cpuid = 0
time = 1673415926
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe01415f67e0
vpanic() at vpanic+0x17f/frame 0xfffffe01415f6830
panic() at panic+0x43/frame 0xfffffe01415f6890
trap_fatal() at trap_fatal+0x385/frame 0xfffffe01415f68f0
calltrap() at calltrap+0x8/frame 0xfffffe01415f68f0
--- trap 0x9, rip = 0xffffffff80af4b3e, rsp = 0xfffffe01415f69c0, rbp = 0xfffffe01415f6a40 ---
__mtx_lock_sleep() at __mtx_lock_sleep+0xce/frame 0xfffffe01415f6a40
vm_reserv_alloc_page() at vm_reserv_alloc_page+0x5c5/frame 0xfffffe01415f6aa0
vm_page_alloc_domain_after() at vm_page_alloc_domain_after+0xb5/frame 0xfffffe01415f6b20
kmem_back_domain() at kmem_back_domain+0x10a/frame 0xfffffe01415f6b90
kmem_malloc_domainset() at kmem_malloc_domainset+0xaf/frame 0xfffffe01415f6c00
keg_alloc_slab() at keg_alloc_slab+0xb0/frame 0xfffffe01415f6c50
zone_import() at zone_import+0xf0/frame 0xfffffe01415f6ce0
cache_alloc() at cache_alloc+0x326/frame 0xfffffe01415f6d50
cache_alloc_retry() at cache_alloc_retry+0x25/frame 0xfffffe01415f6d90
zio_write_compress() at zio_write_compress+0x1a6/frame 0xfffffe01415f6e00
zio_execute() at zio_execute+0x9f/frame 0xfffffe01415f6e40
taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe01415f6ec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe01415f6ef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe01415f6f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe01415f6f30
--- trap 0x246, rip = 0xbc7f0d20, rsp = 0, rbp = 0 ---
KDB: enter: panic
panic.txt0600003014357446366  7153 ustarrootwheelgeneral protection faultversion.txt0600006414357446366  7555 ustarrootwheelFreeBSD 13.1-RELEASE-p2 n245412-484f039b1d0 TRUENAS


Also saw that info.0 had the same modified time...here it is:
Code:
root@truenas[/data/crash]# vim info.0
Dump header from device: /dev/ada2p1
  Architecture: amd64
  Architecture Version: 4
  Dump Length: 598016
  Blocksize: 512
  Compression: none
  Dumptime: 2023-01-10 22:45:26 -0700
  Hostname: truenas.local
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 13.1-RELEASE-p2 n245412-484f039b1d0 TRUENAS
  Panic String: general protection fault
  Dump Parity: 3525357351
  Bounds: 0
  Dump Status: good


Not sure if there are other logs to examine?

I'm not a unix guy (windows developer, so somewhat technical, but not on hardware or linux) so any help appreciated in a manner I hope understand.
Thanks in advance!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, you're panicking deep in ZFS' write path, which means something is very dodgy.

The starting point for this would be to know what hardware you're using, since that's going to be the first suspect.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
If you are using what's on your signature, the only thing that gains my attention is the Realtek NIC, which should have little to do with your current issue.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
(Reminder to everyone that signatures don't show up on mobile)
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Made a little research (thanks Google), I would suggest testing your RAM with memtest.

Burn-in is not trivial.
 

chadilac

Dabbler
Joined
Dec 27, 2022
Messages
24
(Reminder to everyone that signatures don't show up on mobile)

Ah, didn't know that, here you go:

TrueNas: CORE -13.0-U3.1
Mobo: Super Micro X9SCM-F
CPU: Intel Xeon E3-1220 V2
RAM: Crucial 32GB (4 x 8GB) 240-Pin DDR3 SDRAM ECC Unbuffered DDR3L 1600 (PC3L 12800) Server Memory Model CT2KIT102472BD160B
Pool: 2 - 8TB WD Red Pro - mirrored
Boot: 1 - Kingston 250GB SSD - boot
PSU: Seasonic FOCUS GX-550
NIC: RealTek 8125 2.5GB
 

chadilac

Dabbler
Joined
Dec 27, 2022
Messages
24
Made a little research (thanks Google), I would suggest testing your RAM with memtest.

Burn-in is not trivial.

After my post, I found those recommendations. I did a MemTest86 overnight, all passed.

That said, @Ericloewe, with regards to RealTek, yes I found those posts last night as well. But since the NIC was working, I thought I was in the clear. However, the MemTest would freeze after just 4 sec into the test. This repeated 3 or 4 times. I decided to strip out the RealTek NIC, bam, the MemTest was able to proceed! So I was wrong assuming that since I got it to work that meant I was in the clear.

Now I have a networking problem so I can't get into the web gui to even perform a test of the copying that was originally failing.

I had configured the RealTek NIC to be a static IP via the TrueNas web gui. Now the onboard NICs are not able to get an IP. Not sure if their is some issue because of the Tunables I turned on to enable the RealTek NIC which is stopping the onboard NICs from working in some capacity ( I don't think so because i can get into the IPMI web gui from the network), or ????

On the TrueNas screen I chose option1 to reset both em0 and em1, hoping that would do the trick, but no dice, it still says "The web interface could not be accessed. Please check network configuration."

The network config option just gives the option to "reset" them...what else can I do to debug and get the networking up again?
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
I would try a copy over the network using one of the intel NICs on your motherboard ...
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
A Realtek NIC causing both kernel panic and memtest to freeze is something I never read about, congrats.
 

chadilac

Dabbler
Joined
Dec 27, 2022
Messages
24
I would try a copy over the network using one of the intel NICs on your motherboard ...

I would, but after removing the RealTek NIC, my networking isn't working :( I'll make a new post on that topic as I believe this original issue was most likely due to the RealTek NIC.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I would, but after removing the RealTek NIC, my networking isn't working :( I'll make a new post on that topic as I believe this original issue was most likely due to the RealTek NIC.
Did you reconfigure your TrueNAS network settings to use a different NIC? It's not an automatic thing like Windoze.
 

chadilac

Dabbler
Joined
Dec 27, 2022
Messages
24
Did you reconfigure your TrueNAS network settings to use a different NIC? It's not an automatic thing like Windoze.

No, how do I do that from the console? I did chose option 1 to reset em0 and em1, but how do I tell TrueNas to use the onboard NICs from the console?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
A Realtek NIC causing both kernel panic and memtest to freeze is something I never read about, congrats.
I have to agree, I too have never seen that. Very odd, but if it fixed it, all the better.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
No, how do I do that from the console? I did chose option 1 to reset em0 and em1, but how do I tell TrueNas to use the onboard NICs from the console?
So I unfortunately do not have TrueNAS in front of me so my crappy memory might do.

1. When you select option 1 to reset the NIC, are you presented with more than one NIC?
2. You can Delete the NIC's, reboot and the NIC's that are physically present "should" show up. I don't think you need to reboot but I always do as a safe bet.
3. In option 1 I think you can select a NIC, not doing the reset part, rename it and such. Not that I'd rename it, I wouldn't.
4. It might be option 4 (memory is not that good) where you configure the LAN IP, you need to have the correct interface and then enter the IP again.

After the IP is set, you should have access to the GUI.

Sorry again, I'm trying to pull this from memory on something I do rarely.
 

chadilac

Dabbler
Joined
Dec 27, 2022
Messages
24
So I unfortunately do not have TrueNAS in front of me so my crappy memory might do.

1. When you select option 1 to reset the NIC, are you presented with more than one NIC?
2. You can Delete the NIC's, reboot and the NIC's that are physically present "should" show up. I don't think you need to reboot but I always do as a safe bet.
3. In option 1 I think you can select a NIC, not doing the reset part, rename it and such. Not that I'd rename it, I wouldn't.
4. It might be option 4 (memory is not that good) where you configure the LAN IP, you need to have the correct interface and then enter the IP again.

After the IP is set, you should have access to the GUI.

Sorry again, I'm trying to pull this from memory on something I do rarely.

Thanks, it was 3. I thought the reset would do it, I said No, and then got additional options so configure it. I can get to the GUI now, thank you!

Now I'll start testing my data load and see what happens.
 
Last edited:

chadilac

Dabbler
Joined
Dec 27, 2022
Messages
24
Well, crashed again. Using onboard intel NIC, removed all extra PCIe cards and usb, just the mobo with nothing else.

from info.1:
Code:
Dump header from device: /dev/ada1p1
  Architecture: amd64
  Architecture Version: 4
  Dump Length: 364544
  Blocksize: 512
  Compression: none
  Dumptime: 2023-01-11 10:59:43 -0700
  Hostname: truenas.192.168.0.1
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 13.1-RELEASE-p2 n245412-484f039b1d0 TRUENAS
  Panic String: page fault
  Dump Parity: 278830117
  Bounds: 1
  Dump Status: good


from textdump.tar.1.gz:
Code:
<118>Wed Jan 11 10:48:18 MST 2023
<6>pid 1525 (avahi-daemon), jid 0, uid 200: exited on signal 11
<6>pid 1251 (ntpd), jid 0, uid 123: exited on signal 11


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x440
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80af4b3e
stack pointer           = 0x0:0xfffffe014ddce9c0
frame pointer           = 0x0:0xfffffe014ddcea40
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (z_wr_iss_2)
trap number             = 12
panic: page fault
cpuid = 3
time = 1673459983
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe014ddce780
vpanic() at vpanic+0x17f/frame 0xfffffe014ddce7d0
panic() at panic+0x43/frame 0xfffffe014ddce830
trap_fatal() at trap_fatal+0x385/frame 0xfffffe014ddce890
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe014ddce8f0
calltrap() at calltrap+0x8/frame 0xfffffe014ddce8f0
--- trap 0xc, rip = 0xffffffff80af4b3e, rsp = 0xfffffe014ddce9c0, rbp = 0xfffffe014ddcea40 ---
__mtx_lock_sleep() at __mtx_lock_sleep+0xce/frame 0xfffffe014ddcea40
vm_reserv_alloc_page() at vm_reserv_alloc_page+0x5c5/frame 0xfffffe014ddceaa0
vm_page_alloc_domain_after() at vm_page_alloc_domain_after+0xb5/frame 0xfffffe014ddceb20
kmem_back_domain() at kmem_back_domain+0x10a/frame 0xfffffe014ddceb90
kmem_malloc_domainset() at kmem_malloc_domainset+0xaf/frame 0xfffffe014ddcec00
keg_alloc_slab() at keg_alloc_slab+0xb0/frame 0xfffffe014ddcec50
zone_import() at zone_import+0xf0/frame 0xfffffe014ddcece0
cache_alloc() at cache_alloc+0x326/frame 0xfffffe014ddced50
cache_alloc_retry() at cache_alloc_retry+0x25/frame 0xfffffe014ddced90
zio_write_compress() at zio_write_compress+0x1a6/frame 0xfffffe014ddcee00
zio_execute() at zio_execute+0x9f/frame 0xfffffe014ddcee40
taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe014ddceec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe014ddceef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe014ddcef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe014ddcef30
--- trap 0x28, rip = 0xbc7eed20, rsp = 0, rbp = 0 ---
KDB: enter: panic
panic.txt0600001214357574417  7152 ustarrootwheelpage faultversion.txt0600006414357574417  7554 ustarrootwheelFreeBSD 13.1-RELEASE-p2 n245412-484f039b1d0 TRUENAS
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
That said, @Ericloewe, with regards to RealTek, yes I found those posts last night as well. But since the NIC was working, I thought I was in the clear. However, the MemTest would freeze after just 4 sec into the test. This repeated 3 or 4 times. I decided to strip out the RealTek NIC, bam, the MemTest was able to proceed!
That doesn't bode well. Even Realtek isn't crappy enough to crash memtest.
Well, crashed again. Using onboard intel NIC, removed all extra PCIe cards and usb, just the mobo with nothing else.

from info.1:
Code:
Dump header from device: /dev/ada1p1
  Architecture: amd64
  Architecture Version: 4
  Dump Length: 364544
  Blocksize: 512
  Compression: none
  Dumptime: 2023-01-11 10:59:43 -0700
  Hostname: truenas.192.168.0.1
  Magic: FreeBSD Text Dump
  Version String: FreeBSD 13.1-RELEASE-p2 n245412-484f039b1d0 TRUENAS
  Panic String: page fault
  Dump Parity: 278830117
  Bounds: 1
  Dump Status: good


from textdump.tar.1.gz:
Code:
<118>Wed Jan 11 10:48:18 MST 2023
<6>pid 1525 (avahi-daemon), jid 0, uid 200: exited on signal 11
<6>pid 1251 (ntpd), jid 0, uid 123: exited on signal 11


Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address   = 0x440
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80af4b3e
stack pointer           = 0x0:0xfffffe014ddce9c0
frame pointer           = 0x0:0xfffffe014ddcea40
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 0 (z_wr_iss_2)
trap number             = 12
panic: page fault
cpuid = 3
time = 1673459983
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe014ddce780
vpanic() at vpanic+0x17f/frame 0xfffffe014ddce7d0
panic() at panic+0x43/frame 0xfffffe014ddce830
trap_fatal() at trap_fatal+0x385/frame 0xfffffe014ddce890
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe014ddce8f0
calltrap() at calltrap+0x8/frame 0xfffffe014ddce8f0
--- trap 0xc, rip = 0xffffffff80af4b3e, rsp = 0xfffffe014ddce9c0, rbp = 0xfffffe014ddcea40 ---
__mtx_lock_sleep() at __mtx_lock_sleep+0xce/frame 0xfffffe014ddcea40
vm_reserv_alloc_page() at vm_reserv_alloc_page+0x5c5/frame 0xfffffe014ddceaa0
vm_page_alloc_domain_after() at vm_page_alloc_domain_after+0xb5/frame 0xfffffe014ddceb20
kmem_back_domain() at kmem_back_domain+0x10a/frame 0xfffffe014ddceb90
kmem_malloc_domainset() at kmem_malloc_domainset+0xaf/frame 0xfffffe014ddcec00
keg_alloc_slab() at keg_alloc_slab+0xb0/frame 0xfffffe014ddcec50
zone_import() at zone_import+0xf0/frame 0xfffffe014ddcece0
cache_alloc() at cache_alloc+0x326/frame 0xfffffe014ddced50
cache_alloc_retry() at cache_alloc_retry+0x25/frame 0xfffffe014ddced90
zio_write_compress() at zio_write_compress+0x1a6/frame 0xfffffe014ddcee00
zio_execute() at zio_execute+0x9f/frame 0xfffffe014ddcee40
taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfffffe014ddceec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc2/frame 0xfffffe014ddceef0
fork_exit() at fork_exit+0x7e/frame 0xfffffe014ddcef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe014ddcef30
--- trap 0x28, rip = 0xbc7eed20, rsp = 0, rbp = 0 ---
KDB: enter: panic
panic.txt0600001214357574417  7152 ustarrootwheelpage faultversion.txt0600006414357574417  7554 ustarrootwheelFreeBSD 13.1-RELEASE-p2 n245412-484f039b1d0 TRUENAS
Well, there we go...

You seem to be consistently panicking when allocating memory, apparently with the MMU throwing errors, but with different panics, which is a terrible sign.

First step, try letting memtest run and see what it reports. I suspect it will end up crashing, too, but maybe it'll report useful info before it does.
Once that has been done, you'll probably want to go through your RAM and start removing DIMMs until the system works again.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

chadilac

Dabbler
Joined
Dec 27, 2022
Messages
24
That doesn't bode well. Even Realtek isn't crappy enough to crash memtest.

Well, there we go...

You seem to be consistently panicking when allocating memory, apparently with the MMU throwing errors, but with different panics, which is a terrible sign.

First step, try letting memtest run and see what it reports. I suspect it will end up crashing, too, but maybe it'll report useful info before it does.
Once that has been done, you'll probably want to go through your RAM and start removing DIMMs until the system works again.

I did run MemTest last night. It ran for 6 hours and came back as passed.

Them memory is new and is one certified from SuperMicro for this mobo.
The mobo is used of course...so I'd think maybe this would be more suspect?

Given all this, is there something else i can try to narrow down memory vs mobo?
 

chadilac

Dabbler
Joined
Dec 27, 2022
Messages
24
You have to wish the damage is within the RAM and not the motherboard.
The mobo is cheap :) $40 - but more time to replace. The RAM wasn't much more, but it was new vs the used mobo.
 
Top