Crash/Panic with 'kmem_map too small'

Status
Not open for further replies.

sfcredfox

Patron
Joined
Aug 26, 2014
Messages
340
Community,

Had my system go down today due what I believe was 'kmem_map too small' in /data/crash/info.last

Google research and forum searches suggest that you would disable autotune and remove any tuneables it left in regards to kernel memory. I have never enabled autotune. Some posts suggest using autotune, but i've heard people slander it and equate it to the death of your data/system.

Here's tuneables I have actually been playing with in the past to look at L2ARC, these were set a month or two ago:
vfs.zfs.l2arc_write_boost 81920000
vfs.zfs.l2arc_write_max 81920000


The system's activity at the time wasn't too far out of the norm. VM's running, a VM template being deployed to a new virtual machine, deleting another VM from disk, standard VMware stuff. No exceptional load from other running VMs, etc.

Anyone have personal experience with this?

Anyone have recommended fixes or troubleshooting to narrow down?

I feel like my L2ARC is worthless since it usually has a hit ratio of less than 10%, so I suppose I could lose that on each of the two pools, but I didn't have any proof that was the cause of this issue to know if doing that is really the fix, so just unrelated.

Thanks!

FreeNAS 9.3-Stable 64-bit
FreeNAS Platform:
SuperMicro 826 (8XDTN+)
(x2) Intel(R) Xeon(R) CPU E5200 @ 2.27GHz
72GB RAM ECC (Always!)
APC3000 UPS (Always!)
Intel Pro 1000 (integrated) for CIFS
Two Intel Pro 1000 PT/MT Dual Port Card (Four total ports for iSCSI)
Two SLOGS (one for each iSCSI pool) - Intel 3500 SSD
IBM M1015 (IT Mode) HBA (Port 0) -> BPN-SAS2-826EL1 (12 port backplane with expander)
IBM M1015 (IT Mode) HBA (Port 0) -> SFF-8088 connected -> HP MSA70 3G 25 bay drive enclosure
HP SAS HBA (Port 0) -> SFF-8088 connected -> HP DS2700 6G 25 bay drive enclosure
Pool1 (VM Datastore) -> 24x 3G 146GB 10K SAS into 12 vDev Mirrors
Pool2 (VM Datastore) -> 12x 6G 300GB 10K SAS into 6 vDev Mirrors
Pool3 (Media Storage) -> 8x 3G 2TB 7200 SATA into 1vDev[Z2]
Network Infrastructure:
Cisco SG200-26 (26 Port Gigabit Switch)

Four separate vLANs/subnets for iSCSI
  • em2 - x.x.101.7/24
  • em3 - x.x.102.7/24
  • em0 - x.x.103.7/24
  • em1 - x.x.104.7/24
Separate vLAN/subnet for CIFS (Always!)
  • itb - x.x.0.7/24

*edit*
I didn't really mention above, the attached file was my info.last in /data/crash. I looked through debug also, but it mostly looked like boot information since the system came back up.
 

Attachments

  • info.txt
    658 bytes · Views: 203

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Normally you shouldn't enable autotune. It resolves the crash issue you are having, but it also limits performance somewhat because it prevents the system from using 100% of RAM.

I would enable autotune and reboot the system. In your case you obviously are more concerned with crashes rather than some smaller amount of performance. :P
 

sfcredfox

Patron
Joined
Aug 26, 2014
Messages
340
Normally you shouldn't enable autotune. It resolves the crash issue you are having, but it also limits performance somewhat because it prevents the system from using 100% of RAM.

I would enable autotune and reboot the system. In your case you obviously are more concerned with crashes rather than some smaller amount of performance. :p
I'll give it a go and see what happens. Thanks.
 

sfcredfox

Patron
Joined
Aug 26, 2014
Messages
340
Normally you shouldn't enable autotune. It resolves the crash issue you are having, but it also limits performance somewhat because it prevents the system from using 100% of RAM.

I would enable autotune and reboot the system. In your case you obviously are more concerned with crashes rather than some smaller amount of performance. :p

I enabled autotune as a result of your feedback in this matter. I don't have a youtube viral video to show for it, but it did modify the following two items:

vfs.zfs.arc_max 57222189056
vm.kmem_size 58295930880



Questions:
1) Were these items chosen based on the workload of the system when enabled? Or by other means not related to what was currently happening?

2) Will they change as the system encounters different loads?

3) When I run 'arcstat.py -f arcsz' I get 40G back, but my uneducated brain thinks it should be saying around ~57G. What am I not understanding? The system has 72GB total RAM, it's using some of it for L2ARC, and other system stuff. Is that number expected?
 

sfcredfox

Patron
Joined
Aug 26, 2014
Messages
340
Additionally, I noticed a change in the Memory reporting. Maybe this graph is only showing ARC and not total system memory?

Memory Reporting.PNG
 

Attachments

  • ARC Related.PNG
    ARC Related.PNG
    112.8 KB · Views: 201
Status
Not open for further replies.
Top