[SOLVED] Simply had to reinstall FreeNAS on a new USB stick and import my old config.
Hi all,
I think I'm currently having a hardware problem that crashes my server every night at 3AM due to a previous powerloss.
History
Everything used to work just fine. It had uptimes in 100s of days, only going down whenever I scheduled a FreeNAS update (9.0 > 9.3 > 9.10 & updates for those trains - always STABLE).
I'm running 4 jails: 2 VIMAGE ones (one for Iodine, one for OpenVPN), 2 normal ones (one that just downloads stuff, and one that offers a media server to the LAN).
Machine
They were doing public works in my street on Thursday 2016-06-09 around noon. They cut off electricity, and the server crashed. I don't think it was doing anything in particular at that moment...
I turned the machine back on by midnight (1AM), and started all the jails. In the morning I noticed nothing happened overnight, and that the jails (apart from the autostarting iodine one) were turned off. By that time I was at work and apart from a slight unease I just started everything back up and didn't think back on it.
On Saturday I noticed the exact same thing happened again. I just let it go on for a few days, turning off features and scheduled tasks. But it kept on happening.
Tuesday 2016-06-14 I discovered a single dump for every day the server crashed in /data/crash : all on about the same timestamp (ranging from 3:03 to 3:05).
Here are the interesting parts of those dumps:
I've updated FreeNAS in between of the crashes, but nothing changed.
Cause?
I've done my fair share of googling since the crashes started happening. And every link ends up being about faulty hardware.
e.g.
https://forums.freenas.org/index.php?threads/fatal-trap-12-page-fault-while-in-kernel-mode.10268/ : similar error, but not related to pfctl
https://forums.freebsd.org/threads/16575/: FreeBSD, no solution for me
https://forums.freenas.org/index.ph...rap-12-page-fault-while-in-kernel-mode.25542/ : similar error, but not related to pfctl - was a corrupt flashdrive
https://forums.freenas.org/index.php?threads/please-help-me-fatal-trap-12.10321/ : not pfctl - RAM issues
https://forums.freenas.org/index.ph...ult-while-in-kernel-mode-cifs-transfer.12598/ : not pfctl - RAM issues
https://forums.freenas.org/index.php?threads/fatal-trap-12-page-fault-while-in-kernel-mode.24667/ : possibly corrupt flashdrive (never came back to confirm)
https://bugs.pcbsd.org/issues/3912 - 3AM reboot, too little RAM
https://bugs.pcbsd.org/issues/3789 : exactly what I'm experiencing, but was a bug and should have been fixed since then (and it never happened before the powerloss)
I've been running memtest on the machine for over a full day now, but it hasn't turned up anything yet. Memtest even says "nothing found, press escape to exit" but I'll keep it running for a bit longer. I've attached a screenshot from this morning (now over 24h and nothing changed).
Could anything but RAM cause my problem? I'll try burning, and running a fresh copy of freenas on a new stick if nothing turns up. Maybe the flash drive itself got damaged by the powerloss (is that a thing?).
Any help/insight/suggestions/... is welcome. More diagnostic tests apart from memtest are welcome.
Thank you
Hi all,
I think I'm currently having a hardware problem that crashes my server every night at 3AM due to a previous powerloss.
History
Everything used to work just fine. It had uptimes in 100s of days, only going down whenever I scheduled a FreeNAS update (9.0 > 9.3 > 9.10 & updates for those trains - always STABLE).
I'm running 4 jails: 2 VIMAGE ones (one for Iodine, one for OpenVPN), 2 normal ones (one that just downloads stuff, and one that offers a media server to the LAN).
Machine
- Memory stick with FreeNAS installation (FreeNAS 9.10 stable w/ latest updates installed): SanDisk 16GB Cruzer Fit Flash Drive
- Mobo+CPU: C2750D4I (CPU Avoton C2750 Octa-Core Processor)
- RAM: 4x Kingston Technology ValueRAM 8GB 1600MHz DDR3L PC3-12800 ECC CL11 DIMM 1.35V
- Disks: 6x WD Red 4TB (config RAIDZ2) + Samsung 840 series Pro 256GB
- PSU: Silverstone SST-ST45SF-G
They were doing public works in my street on Thursday 2016-06-09 around noon. They cut off electricity, and the server crashed. I don't think it was doing anything in particular at that moment...
I turned the machine back on by midnight (1AM), and started all the jails. In the morning I noticed nothing happened overnight, and that the jails (apart from the autostarting iodine one) were turned off. By that time I was at work and apart from a slight unease I just started everything back up and didn't think back on it.
On Saturday I noticed the exact same thing happened again. I just let it go on for a few days, turning off features and scheduled tasks. But it kept on happening.
Tuesday 2016-06-14 I discovered a single dump for every day the server crashed in /data/crash : all on about the same timestamp (ranging from 3:03 to 3:05).
Here are the interesting parts of those dumps:
So it turns out some automated script that calls 'pfctl' runs at (probably?) 3AM, part of the usual FreeNAS housekeeping. But it causes a crash every single time, in exactly the same spot.Jun11.dump.gz
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address = 0x8
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff821c8a64
stack pointer = 0x28:0xfffffe0860792b90
frame pointer = 0x28:0xfffffe0860793920
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 13036 (pfctl)
Jun12.dump.gz
Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 04
fault virtual address = 0x8
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff821c89f4
stack pointer = 0x28:0xfffffe0860c03b90
frame pointer = 0x28:0xfffffe0860c04920
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 69265 (pfctl)
Jun13.dump.gz
Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 06
fault virtual address = 0x8
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff821d89f4
stack pointer = 0x28:0xfffffe0860de3b90
frame pointer = 0x28:0xfffffe0860de4920
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 39449 (pfctl)
Jun14.dump.gz
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address = 0x8
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff821d89f4
stack pointer = 0x28:0xfffffe0860a49b90
frame pointer = 0x28:0xfffffe0860a4a920
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 98598 (pfctl)
Jun15.dump.gz
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address = 0x8
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff821d89f4
stack pointer = 0x28:0xfffffe0860743b90
frame pointer = 0x28:0xfffffe0860744920
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 56035 (pfctl)
Jun16.dump.gz
Fatal trap 12: page fault while in kernel mode
cpuid = 7; apic id = 0e
fault virtual address = 0x8
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff821d89f4
stack pointer = 0x28:0xfffffe0860966b90
frame pointer = 0x28:0xfffffe0860967920
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 71528 (pfctl)
I've updated FreeNAS in between of the crashes, but nothing changed.
Cause?
I've done my fair share of googling since the crashes started happening. And every link ends up being about faulty hardware.
e.g.
https://forums.freenas.org/index.php?threads/fatal-trap-12-page-fault-while-in-kernel-mode.10268/ : similar error, but not related to pfctl
https://forums.freebsd.org/threads/16575/: FreeBSD, no solution for me
https://forums.freenas.org/index.ph...rap-12-page-fault-while-in-kernel-mode.25542/ : similar error, but not related to pfctl - was a corrupt flashdrive
https://forums.freenas.org/index.php?threads/please-help-me-fatal-trap-12.10321/ : not pfctl - RAM issues
https://forums.freenas.org/index.ph...ult-while-in-kernel-mode-cifs-transfer.12598/ : not pfctl - RAM issues
https://forums.freenas.org/index.php?threads/fatal-trap-12-page-fault-while-in-kernel-mode.24667/ : possibly corrupt flashdrive (never came back to confirm)
https://bugs.pcbsd.org/issues/3912 - 3AM reboot, too little RAM
https://bugs.pcbsd.org/issues/3789 : exactly what I'm experiencing, but was a bug and should have been fixed since then (and it never happened before the powerloss)
I've been running memtest on the machine for over a full day now, but it hasn't turned up anything yet. Memtest even says "nothing found, press escape to exit" but I'll keep it running for a bit longer. I've attached a screenshot from this morning (now over 24h and nothing changed).
Could anything but RAM cause my problem? I'll try burning, and running a fresh copy of freenas on a new stick if nothing turns up. Maybe the flash drive itself got damaged by the powerloss (is that a thing?).
Any help/insight/suggestions/... is welcome. More diagnostic tests apart from memtest are welcome.
Thank you
Attachments
Last edited: