michael.samer
Dabbler
- Joined
- Feb 19, 2018
- Messages
- 21
Hello
I'm using a FN installation (11.1U4) on a ESXi VM with following hardware:
12 Cores (2*6vCPU)
32GB RAM
1*e1000 nic 192.168.0.2/24
1*vmx3 nic 192.168.80.2/24
1*10GB Boot
1*33GB swap
2*500TB SAN volumes
network sharing done with AD (2012R2) and CIFS only
Since we upgraded from (11.1 to 11.1U3) we have a severe memory leak at hand.
After a few hours running one of the smbd processes start consuming all available RAM, and then start to swap. Max. was 260GB after all services died and the system become unresponsive on ssh:
last pid: 35878; load averages: 1.28, 1.43, 1.34 up 6+15:25:23 07:00:03
56 processes: 1 running, 48 sleeping, 1 zombie, 6 waiting
CPU: 0.0% user, 0.0% nice, 12.9% system, 0.0% interrupt, 87.1% idle
Mem: 25G Active, 72K Inact, 6639M Wired, 142M Free
ARC: 3842M Total, 133M MFU, 2872M MRU, 3172K Anon, 66M Header, 768M Other
2566M Compressed, 20G Uncompressed, 8.03:1 Ratio
Swap: 34G Total, 34G Used, K Free, 100% Inuse, 4K In, 4K Out
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
34284 root 16 35 0 199M 109M vmwait 5 59:52 0.14% python3.6
35878 root 1 20 0 8212K 3540K CPU2 2 0:00 0.11% top
12195 root 18 31 0 55604K 13612K uwait 7 16:58 0.04% consul
11568 root 1 20 0 16436K 2296K select 5 3:10 0.03% vmtoolsd
516 root 1 20 0 13776K 5072K select 3 0:03 0.03% sshd
9372 root 1 20 0 9176K 800K select 5 0:35 0.01% devd
12137 root 1 20 0 147M 8004K kqread 0 0:24 0.00% uwsgi
10496 root 1 20 0 10432K 10544K select 2 0:17 0.00% ntpd
13314 root 1 20 0 9004K 2436K select 2 0:08 0.00% zfsd
21414 root 1 46 0 260G 0K pfault 3 510:33 0.00% <smbd>
12269 root 12 20 0 101M 11448K nanslp 6 13:23 0.00% collectd
12126 root 1 20 0 102M 20044K select 6 5:44 0.00% python3.6
49525 root 15 20 0 232M 90656K umtxn 7 4:49 0.00% uwsgi
21280 root 1 20 0 93544K 48820K select 2 4:46 0.00% winbindd
21204 root 1 20 0 128M 65472K select 1 2:24 0.00% smbd
13560 root 18 20 0 33936K 4584K uwait 2 1:07 0.00% consul
9571 root 3 20 0 31940K 0K WAIT 1 1:00 0.00% <syslog-n
12188 root 20 20 0 46680K 7228K uwait 4 0:40 0.00% consul-al
As I have only 2-3 Users (with big datas) running, I'm unable to see the cause apart from a memory leak. The 11.0 version (smb=3.6.8) I'm using at home, seems not to have such an effect where I handle about the same amount of datas.
I already tried the "fixes" from https://forums.freenas.org/index.php?threads/samba-using-up-most-of-the-ram.61039/
with no avail.
The killing by the kernel of the smbd happens about every day (best case) in worst case the whole system just blurbs error messages (ssh/console) like:
swap_pager_getswapspace(3): failed
and no key was accepted, nor did a CAD help in any way.
Three weeks ago and before the system was not showing this high loads. When the ram consumation starts, the CPU climbs fast high (only 1GBit load, so I'd not graps why):
root@DEVNETNAS:~ # swapinfo
Device 1K-blocks Used Avail Capacity
/dev/mirror/swap0.eli 2097152 2097104 48 100%
/dev/gptid/f52b45ff-3d4c-11e8-8 33554432 29001512 4552920 86%
Total 35651584 31098616 4552968 87%
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
97258 root 1 100 0 155G 17498M CPU2 2 16.8H 177.47% smbd
57713 root 1 20 0 8212K 3532K CPU7 7 0:00 2.76% top
12195 root 18 30 0 55604K 16200K uwait 1 14:42 1.19% consul
516 root 1 20 0 13776K 5264K select 1 0:02 0.30% sshd
32729 root 1 20 0 128M 92328K select 1 2:13 0.23% smbd
11568 root 1 20 0 16436K 2444K select 4 2:39 0.08% vmtoolsd
34284 root 16 20 0 199M 112M kqread 0 50:39 0.00% python3.6
12269 root 12 20 0 99M 10460K nanslp 1 11:23 0.00% collectd
32791 root 1 20 0 92976K 53688K select 2 7:21 0.00% winbindd
12126 root 1 22 0 102M 21020K select 5 4:58 0.00% python3.6
49525 root 15 28 0 231M 102M umtxn 0 2:42 0.00% uwsgi
13560 root 18 20 0 33936K 5608K uwait 2 1:06 0.00% consul
9571 root 1 20 0 31932K 3228K kqread 0 0:51 0.00% syslog-ng
12188 root 19 20 0 46552K 6904K uwait 2 0:34 0.00% consul-al
9372 root 1 20 0 9176K 880K select 4 0:31 0.00% devd
12137 root 1 20 0 147M 23904K kqread 1 0:22 0.00% uwsgi
11724 root 1 52 0 13004K 4756K select 3 0:17 0.00% sshd
10496 root 1 20 0 10432K 10544K select 6 0:15 0.00% ntpd
32720 root 1 20 0 49908K 12504K select 4 0:15 0.00% winbindd
13314 root 1 20 0 9004K 2780K select 2 0:07 0.00% zfsd
13558 root 18 32 0 33424K 5164K uwait 1 0:07 0.00% consul
If I'm unable to solve it the next days I'm forced to step over to a different zfs based system which I'm very unhappy to migrate to.
Cheers
Michael
I'm using a FN installation (11.1U4) on a ESXi VM with following hardware:
12 Cores (2*6vCPU)
32GB RAM
1*e1000 nic 192.168.0.2/24
1*vmx3 nic 192.168.80.2/24
1*10GB Boot
1*33GB swap
2*500TB SAN volumes
network sharing done with AD (2012R2) and CIFS only
Since we upgraded from (11.1 to 11.1U3) we have a severe memory leak at hand.
After a few hours running one of the smbd processes start consuming all available RAM, and then start to swap. Max. was 260GB after all services died and the system become unresponsive on ssh:
last pid: 35878; load averages: 1.28, 1.43, 1.34 up 6+15:25:23 07:00:03
56 processes: 1 running, 48 sleeping, 1 zombie, 6 waiting
CPU: 0.0% user, 0.0% nice, 12.9% system, 0.0% interrupt, 87.1% idle
Mem: 25G Active, 72K Inact, 6639M Wired, 142M Free
ARC: 3842M Total, 133M MFU, 2872M MRU, 3172K Anon, 66M Header, 768M Other
2566M Compressed, 20G Uncompressed, 8.03:1 Ratio
Swap: 34G Total, 34G Used, K Free, 100% Inuse, 4K In, 4K Out
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
34284 root 16 35 0 199M 109M vmwait 5 59:52 0.14% python3.6
35878 root 1 20 0 8212K 3540K CPU2 2 0:00 0.11% top
12195 root 18 31 0 55604K 13612K uwait 7 16:58 0.04% consul
11568 root 1 20 0 16436K 2296K select 5 3:10 0.03% vmtoolsd
516 root 1 20 0 13776K 5072K select 3 0:03 0.03% sshd
9372 root 1 20 0 9176K 800K select 5 0:35 0.01% devd
12137 root 1 20 0 147M 8004K kqread 0 0:24 0.00% uwsgi
10496 root 1 20 0 10432K 10544K select 2 0:17 0.00% ntpd
13314 root 1 20 0 9004K 2436K select 2 0:08 0.00% zfsd
21414 root 1 46 0 260G 0K pfault 3 510:33 0.00% <smbd>
12269 root 12 20 0 101M 11448K nanslp 6 13:23 0.00% collectd
12126 root 1 20 0 102M 20044K select 6 5:44 0.00% python3.6
49525 root 15 20 0 232M 90656K umtxn 7 4:49 0.00% uwsgi
21280 root 1 20 0 93544K 48820K select 2 4:46 0.00% winbindd
21204 root 1 20 0 128M 65472K select 1 2:24 0.00% smbd
13560 root 18 20 0 33936K 4584K uwait 2 1:07 0.00% consul
9571 root 3 20 0 31940K 0K WAIT 1 1:00 0.00% <syslog-n
12188 root 20 20 0 46680K 7228K uwait 4 0:40 0.00% consul-al
As I have only 2-3 Users (with big datas) running, I'm unable to see the cause apart from a memory leak. The 11.0 version (smb=3.6.8) I'm using at home, seems not to have such an effect where I handle about the same amount of datas.
I already tried the "fixes" from https://forums.freenas.org/index.php?threads/samba-using-up-most-of-the-ram.61039/
with no avail.
The killing by the kernel of the smbd happens about every day (best case) in worst case the whole system just blurbs error messages (ssh/console) like:
swap_pager_getswapspace(3): failed
and no key was accepted, nor did a CAD help in any way.
Three weeks ago and before the system was not showing this high loads. When the ram consumation starts, the CPU climbs fast high (only 1GBit load, so I'd not graps why):
root@DEVNETNAS:~ # swapinfo
Device 1K-blocks Used Avail Capacity
/dev/mirror/swap0.eli 2097152 2097104 48 100%
/dev/gptid/f52b45ff-3d4c-11e8-8 33554432 29001512 4552920 86%
Total 35651584 31098616 4552968 87%
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
97258 root 1 100 0 155G 17498M CPU2 2 16.8H 177.47% smbd
57713 root 1 20 0 8212K 3532K CPU7 7 0:00 2.76% top
12195 root 18 30 0 55604K 16200K uwait 1 14:42 1.19% consul
516 root 1 20 0 13776K 5264K select 1 0:02 0.30% sshd
32729 root 1 20 0 128M 92328K select 1 2:13 0.23% smbd
11568 root 1 20 0 16436K 2444K select 4 2:39 0.08% vmtoolsd
34284 root 16 20 0 199M 112M kqread 0 50:39 0.00% python3.6
12269 root 12 20 0 99M 10460K nanslp 1 11:23 0.00% collectd
32791 root 1 20 0 92976K 53688K select 2 7:21 0.00% winbindd
12126 root 1 22 0 102M 21020K select 5 4:58 0.00% python3.6
49525 root 15 28 0 231M 102M umtxn 0 2:42 0.00% uwsgi
13560 root 18 20 0 33936K 5608K uwait 2 1:06 0.00% consul
9571 root 1 20 0 31932K 3228K kqread 0 0:51 0.00% syslog-ng
12188 root 19 20 0 46552K 6904K uwait 2 0:34 0.00% consul-al
9372 root 1 20 0 9176K 880K select 4 0:31 0.00% devd
12137 root 1 20 0 147M 23904K kqread 1 0:22 0.00% uwsgi
11724 root 1 52 0 13004K 4756K select 3 0:17 0.00% sshd
10496 root 1 20 0 10432K 10544K select 6 0:15 0.00% ntpd
32720 root 1 20 0 49908K 12504K select 4 0:15 0.00% winbindd
13314 root 1 20 0 9004K 2780K select 2 0:07 0.00% zfsd
13558 root 18 32 0 33424K 5164K uwait 1 0:07 0.00% consul
If I'm unable to solve it the next days I'm forced to step over to a different zfs based system which I'm very unhappy to migrate to.
Cheers
Michael
Last edited: