fatal trap 28

Visseroth · Oct 16, 2012

I have been trying to replace a couple drives I need to rebuild because I removed them for a firmware update. They are HD204UI drives with a known firmware bug that can cause data corruption but every now and then I get a kernel panic that locks up my server, "fatal trap 28"

Does anyone know how I can check the logs and post them up here so you guys have a idea as to WTF is going on?

Server specs are as such...
SuperMicro X7DBN 2U
12GB of FB DIMMs
Dual 800W power supplies
2U Tripplite Battery Backup
12 2TB Samsung HD204UI
IDE Compact Flash adapter with 8GB CF card

William Grzybowski · Oct 16, 2012

Do you have autotune enabled? Or anything under System->Tunables?

If so, disable autotune, then delete tunables and then restart. In that order.

Otherwise try enabling it :)

12GB of RAM might be too low for about 20TB of data, depending how much of that is used.

Stephens · Oct 16, 2012

You don't say which version of FreeNAS you're using. One shell command is "last /var/log/messages". That displays the end of the log. Another is "less /var/log/messages" which allows scrolling. ('Q' to exit). But I'd take a pic of that trap screen if you can because logs may not contain any useful information about a crash.

Visseroth · Oct 16, 2012

ahh. Good to know.
Well I do apologize for not listing what version i'm running.
I'm running FreeNAS-8.2.0-RELEASE-p1-x64 (r11950) and if it panics again I'll take a picture and post it up. Thanks for the responce.

Yes I do have Autotune enabled. I enabled it to try and get better transfer rates as it sometimes liked to transfer down around the 30 to 40MB/s range.

As for the RAM. 12GB seems to be good according to my reports. It doesn't quite fill up for sometime until I put some use on it with large data transfers and I'm pretty much the only one that abuses this thing with huge data transfers from time to time.

This image shows memory usage thus far and it's been running since last night at about 1am and has been resilvering since. Resilver is 84.37% complete with 2h14m left on the clock.

Stephens · Oct 16, 2012

How much of your 20TB is used? The problem with RAM is it's only maxed in certain situations. So you could run just fine on low RAM for a month, they hit one of those situations and panic. I'm not saying it's a RAM issue, but I am saying, "I don't max my RAM until..." could be an area of concern. You only show 2 hours of history in your pic if I'm reading that correctly. Because of the way the forum downsizes attached images, it's often better to host them somewhere like TinyPic and then link to them (though I can read your attachment).

Visseroth · Oct 16, 2012

Currently I have about 10TB in use of the 15.7. I did recently upgrade it from 8GB to 12 and it helped with my speed a LOT. Now I get a steady 80 to 100MB/s vs before I could drag it down to 30 to 40MB/s if I had multiple streams going on.

Here's the past month...

Stephens · Oct 16, 2012

12GB of RAM for 10TB of data is fine (guideline is 1GB RAM for each TB of data). Try William's autotune suggestions.

Visseroth · Oct 16, 2012

Will do.
Well seems I have a similar bug to someone else on the forum. Upon reboot the drives I just resilvered have to be resilvered again because they are no longer part of the pool. That sucks!
I even verified resilver completed at the command prompt with zpool status. Bahh!

Visseroth · Oct 17, 2012

action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
storage DEGRADED 0 0 0
raidz2 DEGRADED 0 0 0
ada7p2 ONLINE 0 0 0
ada8p2 ONLINE 0 0 0
ada9p2 ONLINE 0 0 0
ada10p2 ONLINE 0 0 0
ada11p2 ONLINE 0 0 0
4338291436823275182 UNAVAIL 0 0 0 was /dev/ada12
raidz2 DEGRADED 0 0 0
ada0p2 ONLINE 0 0 0
ada1p2 ONLINE 0 0 0
10712885056224717528 UNAVAIL 0 0 0 was /dev/ada2
ada3p2 ONLINE 0 0 0
ada4p2 ONLINE 0 0 0
ada5p2 ONLINE 0 0 0

Visseroth · Oct 17, 2012

Updated to RC3. We'll see if the kernel panic comes back and if my resilvering completes with a reboot.......

pool: storage
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Oct 17 00:12:17 2012
29.3G scanned out of 13.3T at 77.7M/s, 49h34m to go
4.87G resilvered, 0.22% done
config:

NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/483f8738-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/489be9c2-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/48fb2fa7-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/49857218-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/49f37d65-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/7037831f-1829-11e2-8c8f-003048348d66 ONLINE 0 0 0 (resilvering)
raidz2-1 ONLINE 0 0 0
gptid/6a03112a-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/6a5ec34c-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/f3a5ec46-1829-11e2-8c8f-003048348d66 ONLINE 0 0 0 (resilvering)
gptid/6b19c631-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/6b724a0b-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/6bcbb689-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0

errors: No known data errors

Visseroth · Oct 21, 2012

Seems my kernel panic is gone and my resilvering works as intended with the update to the RC. Thanks for the help guys.

Important Announcement for the TrueNAS Community.

fatal trap 28

Visseroth

Guru

William Grzybowski

Wizard

Stephens

Patron

Visseroth

Guru

Stephens

Patron

Visseroth

Guru

Stephens

Patron

Visseroth

Guru

Visseroth

Guru

Visseroth

Guru

Visseroth

Guru

Similar threads

Important Announcement for the TrueNAS Community.

fatal trap 28

Guru

Wizard

Patron

Guru

Patron

Guru

Patron

Guru

Guru

Guru

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "fatal trap 28"

Similar threads