SOLVED some samba processes consume too much cpu load

Status
Not open for further replies.

tarabas

Cadet
Joined
Jun 26, 2013
Messages
7
For a while now (2 months, and there has been no update or other change at the time the problem started, except some new clients went online) some /smbd processes constantly consume a full cpu core. sometimes there are also two such processes.
The system is freenas-9.10.1-U2 on a HP Microserver Gen8 with 16G RAM, 4 Disks as raidz2.
Upgrading to 9.10.2-U5 via the ISO did not change anything on this problem.

Code:
 PID USER  PRI  NI  VIRT  RES S CPU% MEM%  TIME+  Command
53010 root  52  0  288M 25464 S 25.4  0.2 11:29.31 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
53015 root  21  0  283M 24968 S  1.5  0.2  0:56.75 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
61799 hopfenmue  20  0  334M 28524 S  0.2  0.2  0:04.85 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
56241 twluxmsr  20  0  303M 26840 S  0.1  0.2  0:03.59 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
87267 root  20  0  302M 27092 S  0.0  0.2  0:02.20 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
35915 root  20  0  303M 26996 S  0.0  0.2  0:01.11 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
53040 root  20  0  303M 26652 S  0.0  0.2  0:00.37 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
94694 root  20  0  333M 27520 S  0.0  0.2  0:00.28 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
96641 root  20  0  302M 25976 S  0.0  0.2  0:00.13 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
53014 root  20  0  283M 24968 S  0.0  0.2  0:00.14 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
73679 root  20  0  303M 26684 S  0.0  0.2  0:00.12 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
18229 root  20  0  303M 26496 S  0.0  0.2  0:00.04 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
31119 root  80  0  286M 25360 R 25.4  0.2  0:00.00 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf


In this htop output it is the first process (53010) that consumes too much load. I do not know where it origins from.
All other processes disappear after short time which is the expected behavior.

Killing the problematic process breaks the samba services. Restarting samba revives the serving but also the problematic process.

From the reporting graphs I know the problem began 2 months ago and is present all the time (24/7). What makes it more mysterious is the fact that there is every morning between 7:40 and 8:00 a short period of 2-5 minutes where the cpu load returns to the expected levels. The system is up and running since 2.5 years now with absolutely no problem so far serving ~20 windows computers, some raspberries and 2 Macs.

The only suspicion I have (after trying for days all the tips and tricks I could find in forums and elsewhere without result) is that a specific client user or computer is causing this.
I did also scroll through the bug reports but could not find something useful, or something that solved the problem.

How can I determine which user/computer had invokes the specific samba process?

The server is responding fast and there is no other problem, but my suspicion is something must be wrong somewhere, and I can only sleep quite if it is solved.... :)
 
Last edited:

tarabas

Cadet
Joined
Jun 26, 2013
Messages
7
Thanks!
That I already tried - but the process in focus is not listed at all by smbstatus. That is part of the strangeness.
Instead ps reveals something:
Code:
ps -A | grep smbd
33693  -  I  0:00.61 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
35915  -  S  0:01.17 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
37368  -  S  0:03.15 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
53010  -  RLs  36:43.10 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
53014  -  S  0:00.27 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
53015  -  S  3:01.41 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
53040  -  I  0:01.17 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
54356  -  I  0:01.01 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
58856  -  S  0:11.89 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
61799  -  I  0:06.23 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
73679  -  I  0:00.13 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
75750  -  R  0:00.00 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
78490  -  I  0:01.89 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
87267  -  S  0:02.30 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
94694  -  I  0:04.59 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
96641  -  S  0:00.38 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf

The process in question is still 53010 whose status changes all the time between RLs, Ds, Rs and Ss.
This means, the process is either (R)unning or (S)leeping, it may have (L)ocked pages and is a (s)ession leader.
What does this tell me?
Another idea beside smbstatus how to identify the client/user that invoked the evil process?
 
Last edited:

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
For a while now (2 months, and there has been no update or other change at the time the problem started, except some new clients went online) some /smbd processes constantly consume a full cpu core. sometimes there are also two such processes.
The system is freenas-9.10.1-U2 on a HP Microserver Gen8 with 16G RAM, 4 Disks as raidz2.
Upgrading to 9.10.2-U5 via the ISO did not change anything on this problem.

Code:
 PID USER  PRI  NI  VIRT  RES S CPU% MEM%  TIME+  Command
53010 root  52  0  288M 25464 S 25.4  0.2 11:29.31 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
53015 root  21  0  283M 24968 S  1.5  0.2  0:56.75 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
61799 hopfenmue  20  0  334M 28524 S  0.2  0.2  0:04.85 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
56241 twluxmsr  20  0  303M 26840 S  0.1  0.2  0:03.59 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
87267 root  20  0  302M 27092 S  0.0  0.2  0:02.20 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
35915 root  20  0  303M 26996 S  0.0  0.2  0:01.11 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
53040 root  20  0  303M 26652 S  0.0  0.2  0:00.37 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
94694 root  20  0  333M 27520 S  0.0  0.2  0:00.28 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
96641 root  20  0  302M 25976 S  0.0  0.2  0:00.13 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
53014 root  20  0  283M 24968 S  0.0  0.2  0:00.14 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
73679 root  20  0  303M 26684 S  0.0  0.2  0:00.12 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
18229 root  20  0  303M 26496 S  0.0  0.2  0:00.04 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf
31119 root  80  0  286M 25360 R 25.4  0.2  0:00.00 /usr/local/sbin/smbd --daemon --configfile=/usr/local/etc/smb4.conf


In this htop output it is the first process (53010) that consumes too much load. I do not know where it origins from.
All other processes disappear after short time which is the expected behavior.

Killing the problematic process breaks the samba services. Restarting samba revives the serving but also the problematic process.

From the reporting graphs I know the problem began 2 months ago and is present all the time (24/7). What makes it more mysterious is the fact that there is every morning between 7:40 and 8:00 a short period of 2-5 minutes where the cpu load returns to the expected levels. The system is up and running since 2.5 years now with absolutely no problem so far serving ~20 windows computers, some raspberries and 2 Macs.

The only suspicion I have (after trying for days all the tips and tricks I could find in forums and elsewhere without result) is that a specific client user or computer is causing this.
I did also scroll through the bug reports but could not find something useful, or something that solved the problem.

How can I determine which user/computer had invokes the specific samba process?

The server is responding fast and there is no other problem, but my suspicion is something must be wrong somewhere, and I can only sleep quite if it is solved.... :)

Pull the network cable and restart samba. Does the problem disappear???? Boom! Nailed it! :D

PM me a debug file (system->advanced->save debug) and I'll try to find some time to review it tomorrow. If possible, disconnect all known-good clients then ratchet Samba logging (Services->SMB) to "Debug" before generating the debug file.
 

tarabas

Cadet
Joined
Jun 26, 2013
Messages
7
Hm, pulling the cable is not that trivial - this is a live system in productive environment.
The clients are scattered around the building and some have tasks that do not like interruptions... the process of taking clients off/online may therefore take some days.
Before posting the debug file I will screen it a bit myself - but will come back with it or questions for sure :)
Thanks for pointing me in that direction.
 

millst

Contributor
Joined
Feb 2, 2015
Messages
141
I had this same problem just a few days ago. Turns out one of my clients was running amok, but it's a small home network and was easy to find the source. Would have been easier if smbstatus had displayed the connection info.

-tm
 

tarabas

Cadet
Joined
Jun 26, 2013
Messages
7
Stupid me! tcpdump revealed the mysterious client very easily.
Code:
10:06:31.487546 IP sglx-srv1.fritz.box.microsoft-ds > srv-sql0.fritz.box.51632: Flags [.], ack 43, win 2058, options [nop,nop,TS val 6892317 ecr 168401784], length 0
10:06:31.492442 IP sglx-srv1.fritz.box.microsoft-ds > srv-sql0.fritz.box.51632: Flags [F.], seq 1, ack 43, win 2058, options [nop,nop,TS val 6892321 ecr 168401784], length 0
10:06:31.492909 IP srv-sql0.fritz.box.51632 > sglx-srv1.fritz.box.microsoft-ds: Flags [F.], seq 43, ack 2, win 229, options [nop,nop,TS val 168401785 ecr 6892321], length 0
10:06:31.492932 IP sglx-srv1.fritz.box.microsoft-ds > srv-sql0.fritz.box.51632: Flags [.], ack 44, win 2058, options [nop,nop,TS val 6892322 ecr 168401785], length 0
10:06:31.492936 IP srv-sql0.fritz.box.51634 > sglx-srv1.fritz.box.microsoft-ds: Flags , seq 3385250625, win 29200, options [mss 1460,sackOK,TS val 168401785 ecr 0,nop,wscale 7], length 0
10:06:31.492950 IP sglx-srv1.fritz.box.microsoft-ds > srv-sql0.fritz.box.51634: Flags [S.], seq 1861161364, ack 3385250626, win 65535, options [mss 1460,nop,wscale 6,sackOK,TS val 3630028114 ecr 168401785], length 0
10:06:31.493116 IP srv-sql0.fritz.box.51634 > sglx-srv1.fritz.box.microsoft-ds: Flags [.], ack 1, win 229, options [nop,nop,TS val 168401785 ecr 3630028114], length 0
10:06:31.493151 IP sglx-srv1.fritz.box.microsoft-ds > srv-sql0.fritz.box.51634: Flags [.], ack 1, win 2058, options [nop,nop,TS val 3630028114 ecr 168401785], length 0
10:06:31.493264 IP srv-sql0.fritz.box.51634 > sglx-srv1.fritz.box.microsoft-ds: Flags [P.], seq 1:43, ack 1, win 229, options [nop,nop,TS val 168401785 ecr 3630028114], length 42SMB PACKET: SMBecho (REQUEST)

These packages occured in large numbers!

Freenas is running as sglx-srv1. The machine srv-sql0 is a standard ubuntu server providing mysql services, and has webmin running, nothing else.

It looks like freenas is poking the ubuntu machine with the microsoft-ds packages. Why? For what?

On freenas only AFP/RSYNC/CIFS/SSH are active, and samba is configured for a workgroup only (I hope so), and is the master browser.

What may be wrong here or there? Good thing is - no bug in freenas :)
 
Last edited:

tarabas

Cadet
Joined
Jun 26, 2013
Messages
7
After taking down the mentioned ubuntu server the unwanted cpu load was gone.
I switched it on back again but the load did not re-appear. Re-checked: no samba installed at all on that machine.

Anyway. Restarted the second ubuntu server with samba running. Nothing unusual happening on freenas.
In that samba config shifted "master browser" from "auto" to "off" for safety. Still freenas is happy.

Problem solved - even if the original cause remains a little unclear. Will reopen the thread if issues re-appear.

Thanks all!
 
Status
Not open for further replies.
Top