Marcet
Contributor
- Joined
- May 31, 2013
- Messages
- 193
But, I'm affraid it wont be a good idea : https://forums.freenas.org/index.php?threads/crash-panic-with-kmem_map-too-small.42294/#post-273866
I have the latest BIOS and BMC for the board.Just for giggles, are there any BIOS updates or anything like that for your system? Maybe try resetting your BIOS to defaults and see what happens.
I won't do it ;)I wouldn't try reverting to older stuff (that's how you brick your board).
When I updated BMC, I also reset the settings (as it was not recommended to keep it).But, you could try defaulting the BIOS and BMC and only change the stuff you *must* change like boot device. If it starts working properly then you can start setting the BIOS to better (read: more optimized) settings. :D
I've had 1 or 2 ramdom reboot with my backup system. It's this system:
Motherboard: SuperMicro A1SAi-2750F mini-ITX
CPU: Intel Atom C2750 CPU, 8-core 2.4 GHz
RAM: Kingston 4x 8GB 2Rx8 1G x 72-Bit PC3L-12800 CL11 204-Pin ECC SODIMM
Drives: 6x WD Red 3 TB, RAIDZ2
It is only used to zfs replication, nothing more. Solved by enabling autotune as read in a thread in the forum. I'll try to find it again.
I would be interested in this information, too, especially running which FreeNAS version and which settings were made by autotune.
BTW: CPU-/Mainboard- and RAM-wise my hardware is identical. I didn't see any instability (running almost 24/7 for about 12 months now) and never fiddled with autotune and/or tunables so far.
I know. I should have been more careful and add some timing between actions.You have done quite a few things to change the configuration and it all seams to have started when you added the SSDs. I'm not sure what you have setup for jails but hopefully you have a configuration backup from before you added the SSD's to your system, and then maybe you could just restore that configuration file and see if everything is working again.
For now, I've moved my Jails from SSD to the main HDD based volume and physically removed the SSDs.Additionally I would re-enable all the watchdog timers you disabled. Basically roll your system back to a state where it worked.
Ok.If you upgraded the motherboard firmware, I would leave that alone, do not go backwards. If you really think a firmware update caused this problem, open up a ticket with Supermicro, but it's going to be a tuff sale since you have made so many changes in a short period of time so figuring out what the issue is could take a while.
As a matter of fact, I installed 9.10 when I start to think about a watchdog problem.Also, your configuration lists you run the latest version of 9.10. Since there was a recent update to this, maybe rolling back a version as well. Of course I'm making an assumption you jump on the updates as soon as they come out.
I will be more careful for the futur. Thanks for the advice.Some advice for the future... Take your time making changes to a functional system. Add something and give it time to see if it causes an adverse reaction. This means if you perform a BIOS upgrade, wait a few days to ensure no problems crop up. If you are not good at keeping notes on changes you make to your system, I'd wait a week. I myself would wait a few hours while I put the system through it's paces and try to break it. If it lasts, time to move on, but that is just me.
I have no 600w PSU to compare. But I will make the stress test in a few days to check the system stability under pressure.As for if you suspect the problem to be the power supply, I'd recommend you find one you could install to see if the problem goes away. Even though I know how to use an O'scope, most people do not have one at home and the one's I have at work are classified so I can't take one home nor could I bring in my computer to connect to it. Having a good spare power supply is a very good useful tool to have on had. And just because the power supply you have is a Seasonic (my favorite brand), it doesn't mean it hasn't failed. Although at this point in time I don't suspect the PSU is the culprit. Run Memtest and a CPU stress test, these typically can root out a PSU issue, ensure you have all your drives connected to pull the maximum load as well, that includes the SSDs if you want to see if they are causing the PSU to overload.
No SATA cables involved, but SAS cables and backplanes.Additionally, the cables you are connecting to your SSDs, ensure they are in good condition.
Thank you very much for your support and this very clear briefing. Appreciate it (y)Good Luck.
Absolutely. It was working before so I say try to restore the original configuration if you can, or at least over time. Lets say all is good after your 3 days test, then I'd add in all of the watchdog timers. If it breaks then remove half of the timers and try again. Try to sort it out in the least amount of time possible. It may have been the SSDs or how they were configured, lets hope so just so you can get back to a fully operation and reliable system.Do you mean all watchdogs ? BIOS, Hardware Jumper and Daemon ?
We all get like that, it's normal.I know. I should have been more careful and add some timing between actions.
But, you know what it is... I ran 2 FreeNAS servers based on old PCs from about 5 years and never had any problem other than failing hard drives.
So I was full of enthusiasm when I finally build a proper server grade machine ;)
Who has never been exited by tech launch me the first stone :D
[root@nas-backup] /data/crash# less info.0 Dump header from device /dev/dumpdev Architecture: amd64 Architecture Version: 1 Dump Length: 122368B (0 MB) Blocksize: 512 Dumptime: Mon Mar 14 15:47:31 2016 Hostname: nas-backup.hsnetworks Magic: FreeBSD Text Dump Version String: FreeBSD 9.3-RELEASE-p31 #0 r288272+33bb475: Wed Feb 3 02:19:35 PST 2016 root@build3.ixsystems.com:/tank/home/stable-builds/FN/objs/os-base/amd64/tank/home/stable-builds/FN/FreeBSD/src/syskmem_malloc(16777216): kmem_map too small: 17581543424 total allocated Panic String: kmem_malloc(16777216): kmem_map too small: 17581543424 total allocated Dump Parity: 1880663058 Bounds: 0 Dump Status: good
IIRC, the MX200s had an odd firmware bug or two that was patched in an update.
Thanks. I'll keep you informed after firmware update in a few days. I will not rush.Glad you figured it out.
5 2016/04/13 11:32:03 Watchdog 2 #0xca Watchdog 2 Timer Interrupt - Assertion 6 2016/04/13 11:32:04 Watchdog 2 #0xca Watchdog 2 Hard Reset - Assertion
How to disable the watchdog properly ?# THIS FILE IS RESERVED FOR THE EXCLUSIVE USE OF FREENAS CONFIG SYSTEM.
# Please edit /etc/rc.conf instead.