fatal trap 28

Status
Not open for further replies.

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
I have been trying to replace a couple drives I need to rebuild because I removed them for a firmware update. They are HD204UI drives with a known firmware bug that can cause data corruption but every now and then I get a kernel panic that locks up my server, "fatal trap 28"

Does anyone know how I can check the logs and post them up here so you guys have a idea as to WTF is going on?

Server specs are as such...
SuperMicro X7DBN 2U
12GB of FB DIMMs
Dual 800W power supplies
2U Tripplite Battery Backup
12 2TB Samsung HD204UI
IDE Compact Flash adapter with 8GB CF card
 

William Grzybowski

Wizard
iXsystems
Joined
May 27, 2011
Messages
1,754
Do you have autotune enabled? Or anything under System->Tunables?

If so, disable autotune, then delete tunables and then restart. In that order.

Otherwise try enabling it :)

12GB of RAM might be too low for about 20TB of data, depending how much of that is used.
 

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
You don't say which version of FreeNAS you're using. One shell command is "last /var/log/messages". That displays the end of the log. Another is "less /var/log/messages" which allows scrolling. ('Q' to exit). But I'd take a pic of that trap screen if you can because logs may not contain any useful information about a crash.
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
ahh. Good to know.
Well I do apologize for not listing what version i'm running.
I'm running FreeNAS-8.2.0-RELEASE-p1-x64 (r11950) and if it panics again I'll take a picture and post it up. Thanks for the responce.

Yes I do have Autotune enabled. I enabled it to try and get better transfer rates as it sometimes liked to transfer down around the 30 to 40MB/s range.

As for the RAM. 12GB seems to be good according to my reports. It doesn't quite fill up for sometime until I put some use on it with large data transfers and I'm pretty much the only one that abuses this thing with huge data transfers from time to time.

memory-1h.png

This image shows memory usage thus far and it's been running since last night at about 1am and has been resilvering since. Resilver is 84.37% complete with 2h14m left on the clock.
 

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
How much of your 20TB is used? The problem with RAM is it's only maxed in certain situations. So you could run just fine on low RAM for a month, they hit one of those situations and panic. I'm not saying it's a RAM issue, but I am saying, "I don't max my RAM until..." could be an area of concern. You only show 2 hours of history in your pic if I'm reading that correctly. Because of the way the forum downsizes attached images, it's often better to host them somewhere like TinyPic and then link to them (though I can read your attachment).
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
Currently I have about 10TB in use of the 15.7. I did recently upgrade it from 8GB to 12 and it helped with my speed a LOT. Now I get a steady 80 to 100MB/s vs before I could drag it down to 30 to 40MB/s if I had multiple streams going on.

Here's the past month...
memory-1m.jpg
 

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
12GB of RAM for 10TB of data is fine (guideline is 1GB RAM for each TB of data). Try William's autotune suggestions.
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
Will do.
Well seems I have a similar bug to someone else on the forum. Upon reboot the drives I just resilvered have to be resilvered again because they are no longer part of the pool. That sucks!
I even verified resilver completed at the command prompt with zpool status. Bahh!
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
storage DEGRADED 0 0 0
raidz2 DEGRADED 0 0 0
ada7p2 ONLINE 0 0 0
ada8p2 ONLINE 0 0 0
ada9p2 ONLINE 0 0 0
ada10p2 ONLINE 0 0 0
ada11p2 ONLINE 0 0 0
4338291436823275182 UNAVAIL 0 0 0 was /dev/ada12
raidz2 DEGRADED 0 0 0
ada0p2 ONLINE 0 0 0
ada1p2 ONLINE 0 0 0
10712885056224717528 UNAVAIL 0 0 0 was /dev/ada2
ada3p2 ONLINE 0 0 0
ada4p2 ONLINE 0 0 0
ada5p2 ONLINE 0 0 0
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
Updated to RC3. We'll see if the kernel panic comes back and if my resilvering completes with a reboot.......


pool: storage
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Wed Oct 17 00:12:17 2012
29.3G scanned out of 13.3T at 77.7M/s, 49h34m to go
4.87G resilvered, 0.22% done
config:

NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/483f8738-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/489be9c2-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/48fb2fa7-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/49857218-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/49f37d65-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/7037831f-1829-11e2-8c8f-003048348d66 ONLINE 0 0 0 (resilvering)
raidz2-1 ONLINE 0 0 0
gptid/6a03112a-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/6a5ec34c-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/f3a5ec46-1829-11e2-8c8f-003048348d66 ONLINE 0 0 0 (resilvering)
gptid/6b19c631-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/6b724a0b-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0
gptid/6bcbb689-6f5c-11e1-90b1-003048348d66 ONLINE 0 0 0

errors: No known data errors
 

Visseroth

Guru
Joined
Nov 4, 2011
Messages
546
Seems my kernel panic is gone and my resilvering works as intended with the update to the RC. Thanks for the help guys.
 
Status
Not open for further replies.
Top