FreeNAS 11 Constant High CPU from Python3.6 process

Status
Not open for further replies.

FreeNASBob

Patron
Joined
Aug 23, 2014
Messages
226
FreeNAS-11.0-U2 (e417d8aa5). No cron jobs, nothing running, no jails.

Top looks like this all the time for days on end:

Code:
PID USERNAME	THR PRI NICE   SIZE	RES STATE   C   TIME	WCPU COMMAND   
 9614 root		  1  83	0   251M 59096K CPU0	0   0:05  98.14% python3.6 
18825 root		  2  22	0   127M 32492K select  3   2:32  29.39% python3.6 
 6359 root		 15  20	0   460M   131M umtxn   3   1:50   5.82% uwsgi	 
 5524 root		 12  20	0 47936K 21308K uwait   2   3:12   0.27% consul	
 9422 root		  1  20	0 24272K  3448K CPU1	1   0:00   0.19% top	   
 5528 root		 12  20	0 40432K 11680K uwait   1   0:10   0.18% consul-al 
 3989 root		  1  20	0  9560K  5240K select  1   1:12   0.15% devd	  
 8784 root		  3  20	0 72124K 16820K kqread  3   0:47   0.12% syslog-ng 
 8670 www		   1  20	0 30980K  6264K kqread  0   0:02   0.10% nginx	 
 5258 root		  1  22	0 13136K  3556K wait	0   0:01   0.08% sh		
 6237 root		  1  20	0 49912K  8124K select  3   0:37   0.08% zfsd	  
 6177 root		  8  22	0 39708K 12004K uwait   1   0:00   0.02% consul	
 4653 root		  1  20	0 10420K  2136K select  1   0:15   0.01% powerd	
 4754 uucp		  1  20	0 16988K  2784K select  1   0:11   0.01% usbhid-up 
 4764 uucp		  1  20	0 16812K  2580K nanslp  0   0:02   0.01% upsmon


What's going on?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
FreeNAS-11.0-U2 (e417d8aa5). No cron jobs, nothing running, no jails.

Top looks like this all the time for days on end:

Code:
PID USERNAME	THR PRI NICE   SIZE	RES STATE   C   TIME	WCPU COMMAND   
 9614 root		  1  83	0   251M 59096K CPU0	0   0:05  98.14% python3.6 
18825 root		  2  22	0   127M 32492K select  3   2:32  29.39% python3.6 
 6359 root		 15  20	0   460M   131M umtxn   3   1:50   5.82% uwsgi	 
 5524 root		 12  20	0 47936K 21308K uwait   2   3:12   0.27% consul	
 9422 root		  1  20	0 24272K  3448K CPU1	1   0:00   0.19% top	   
 5528 root		 12  20	0 40432K 11680K uwait   1   0:10   0.18% consul-al 
 3989 root		  1  20	0  9560K  5240K select  1   1:12   0.15% devd	  
 8784 root		  3  20	0 72124K 16820K kqread  3   0:47   0.12% syslog-ng 
 8670 www		   1  20	0 30980K  6264K kqread  0   0:02   0.10% nginx	 
 5258 root		  1  22	0 13136K  3556K wait	0   0:01   0.08% sh		
 6237 root		  1  20	0 49912K  8124K select  3   0:37   0.08% zfsd	  
 6177 root		  8  22	0 39708K 12004K uwait   1   0:00   0.02% consul	
 4653 root		  1  20	0 10420K  2136K select  1   0:15   0.01% powerd	
 4754 uucp		  1  20	0 16988K  2784K select  1   0:11   0.01% usbhid-up 
 4764 uucp		  1  20	0 16812K  2580K nanslp  0   0:02   0.01% upsmon


What's going on?
Can you give us a little bit of background on what Hardware you have in there?

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Are you sure nothing is going on? Have you rebooted?
an you give us a little bit of background on what Hardware you have in there?

Sent from my SAMSUNG-SGH-I537 using Tapatalk
You can't see his hardware in his tagline, the downside to using a cellphone:
ASRock Rack C2550D4I (Failed twice)
SeaSonic SSR-450RM
Cyberpower CP1000PFCLCD
6 Western Digital 2TB WD20EZRX
16GB Crucial ECC CT2KIT102472BD160B
Fractal Design Node 304
 

FreeNASBob

Patron
Joined
Aug 23, 2014
Messages
226
Are you sure nothing is going on? Have you rebooted?

Nothing's going on that I initiated, anyway.

Rebooted just a few minutes ago.

Code:
PID USERNAME	THR PRI NICE   SIZE	RES STATE   C   TIME	WCPU COMMAND   
 3992 root		  2  22	0   127M 31988K select  0   0:11  98.80% python3.6 
 3731 root		 15  20	0   448M   121M umtxn   3   0:08   7.42% uwsgi	 
 4032 root		  1  20	0 24272K  3368K CPU3	3   0:00   0.88% top	   
 2895 root		 11  20	0 45888K 18308K kqread  2   0:01   0.18% consul	
 2126 uucp		  1  20	0 16988K  2784K select  2   0:00   0.08% usbhid-up 
 2811 www		   1  20	0 30980K  6012K kqread  0   0:00   0.08% nginx	 
 2025 root		  1  20	0 10420K  2136K select  0   0:00   0.01% powerd	
 2874 root		  1  20	0   407M 98816K kqread  1   0:15   0.01% uwsgi	 
  210 root		  5  32	0   395M 93140K kqread  3   0:14   0.01% python3.6 
 2863 root		  1  52	0   297M 64584K select  2   0:15   0.00% python3.6 
 3679 root		  1  52	0   236M 56588K ttyin   0   0:05   0.00% python3.6 
 2978 root		 12  20	0   227M 31332K nanslp  3   0:02   0.00% collectd  
 3068 root		  6  52	0   104M 19512K usem	3   0:01   0.00% python3.6 
 2022 root		  2  20	0 24704K 12544K select  3   0:00   0.00% ntpd	  
 2597 root		  1  52	0 27012K  4416K nanslp  1   0:00   0.00% smartd
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
What web browser are you using?
Are you using something like Putty and SSH to get into FreeNAS and do the top command or using the GUI shell? Don't use the GUI shell.

It looks like to me that if I use FireFox that Python eats up some CPU time, I.E and Chrome on my system didn't. I'm using a Windoze 7 system. I get .02% with no GUI open.

If you do not have the GUI open and are using SSH and still get this result, I'm not sure what is going on then.
 

chris crude

Patron
Joined
Oct 13, 2016
Messages
210
Are you sure nothing is going on? Have you rebooted?

You can't see his hardware in his tagline, the downside to using a cellphone:
ASRock Rack C2550D4I (Failed twice)
SeaSonic SSR-450RM
Cyberpower CP1000PFCLCD
6 Western Digital 2TB WD20EZRX
16GB Crucial ECC CT2KIT102472BD160B
Fractal Design Node 304
I can't help with the original post, but on my phone I notice that when I turn it 90 degree to widescreen, the footers/tagline become visible.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Can you run both top again, and ps -ax | grep "[p]ython" so we can see which python script is using up all your cpu.
 

FreeNASBob

Patron
Joined
Aug 23, 2014
Messages
226
Interesting. When I run top I see the Python3.6 process with its PID, but by the time I CTRL+C and run PS the PID is gone. If I wait a moment there is a new Python3.6 process eating CPU with a new, higher PID. It's like new processes keep spawning every few seconds. It's killing my CPU.

PQBKPwU.png


Edit - I was finally able to "catch" it. Commands run from SSH despite the Python script appearing to be for the WebUI.

Code:
PID USERNAME	THR PRI NICE   SIZE	RES STATE   C   TIME	WCPU COMMAND
63708 root		  1  78	0   130M 38604K CPU1	1   0:02 104.90% python3.6
63700 root		  1  20	0 24272K  3448K CPU0	0   0:00   0.46% top
 2895 root		 12  20	0 49984K 20460K uwait   3   2:00   0.15% consul
 3992 root		  2  20	0   127M 32040K select  0   3:25   0.03% python3.6
62871 root		  1  20	0 82848K  7256K select  3   0:00   0.03% sshd
 2126 uucp		  1  20	0 16988K  2784K select  0   0:07   0.02% usbhid-up
 2025 root		  1  20	0 10420K  2136K select  3   0:10   0.01% powerd
 2022 root		  1  20	0 24700K 12524K select  2   0:03   0.01% ntpd
 2874 root		  1  20	0   407M 98816K kqread  1   0:17   0.01% uwsgi
 6236 root		  1  20	0   267M 13100K select  2   0:00   0.00% winbindd
 2128 uucp		  1  20	0 29104K  2612K select  2   0:02   0.00% upsd
 6649 root		 12  20	0   221M 28124K nanslp  2   6:34   0.00% collectd
 2863 root		  1  52	0   297M 64892K select  3   1:48   0.00% python3.6
 3731 root		 15  20	0   451M   127M umtxn   3   1:04   0.00% uwsgi
  210 root		  5  20	0   397M 95648K kqread  1   0:19   0.00% python3.6


Code:
root@freenas:~ # ps -ax | grep "[p]ython"
  210  -  I	   0:19.41 python3.6: middlewared (python3.6)
 2863  -  I	   1:48.37 python: alertd (python3.6)
 3992  -  S	   3:24.56 python: webshelld (python3.6)
62675  -  I	   0:00.54 /usr/local/bin/python /usr/local/libexec/nas/register
63708  -  Rs	  0:04.66 /usr/local/bin/python /usr/local/www/freenasUI/tools/
 3679 v0  Is+	 0:04.75 /usr/local/bin/python /etc/netcli (python3.6)
 

FreeNASBob

Patron
Joined
Aug 23, 2014
Messages
226
It's different scripts each time I catch it. This time it was:

Code:
  PID USERNAME	THR PRI NICE   SIZE	RES STATE   C   TIME	WCPU COMMAND
64126 root		  1  80	0   225M 49100K CPU2	2   0:03  95.05% python3.6
64118 root		  1  20	0 24272K  3444K CPU1	1   0:00   0.54% top
 2895 root		 12  20	0 49984K 20472K piperd  3   2:01   0.38% consul
 3992 root		  2  20	0   127M 32040K select  2   3:25   0.10% python3.6
62871 root		  1  20	0 82848K  7268K select  3   0:00   0.03% sshd
 2126 uucp		  1  20	0 16988K  2784K select  1   0:07   0.03% usbhid-up
 2025 root		  1  20	0 10420K  2136K select  3   0:10   0.01% powerd
 2022 root		  1  20	0 24700K 12524K select  3   0:03   0.01% ntpd
 2874 root		  1  20	0   407M 98816K kqread  3   0:17   0.01% uwsgi
 6649 root		 12  20	0   221M 28204K nanslp  3   6:37   0.00% collectd
 2863 root		  1  52	0   297M 64892K select  1   1:49   0.00% python3.6
 3731 root		 15  20	0   451M   127M kqread  0   1:06   0.00% uwsgi
  210 root		  5  20	0   397M 95648K kqread  1   0:19   0.00% python3.6
 2899 root		 12  20	0 34288K 11320K uwait   0   0:06   0.00% consul-al
 3679 root		  1  52	0   236M 56588K ttyin   0   0:05   0.00% python3.6
root@freenas:~ # ps -ax | grep "[p]ython"
  210  -  I	   0:19.44 python3.6: middlewared (python3.6)
 2863  -  I	   1:49.03 python: alertd (python3.6)
 3992  -  S	   3:24.69 python: webshelld (python3.6)
62675  -  I	   0:00.54 /usr/local/bin/python /usr/local/libexec/nas/register
64126  -  R	   0:04.70 /usr/local/bin/python /usr/local/bin/alertcli.py (pyt
 3679 v0  Is+	 0:04.75 /usr/local/bin/python /etc/netcli (python3.6)
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
All I can imagine is its running off the rails somehow.

Maybe try a bug report. Do it from the GUI so that you can attach a debug report.
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
The Python module autorepl.py gets called every minute on my system, as can be seen in /var/log/debug.log. Another candidate is autosnap.py. Using top -a might help to diagnose.

Code:
~ # grep "Autosnap replication started" /var/log/debug.log | tail -5
Aug 25 12:37:10 blunzn /autorepl.py: [tools.autorepl:221] Autosnap replication started
Aug 25 12:38:10 blunzn /autorepl.py: [tools.autorepl:221] Autosnap replication started
Aug 25 12:39:10 blunzn /autorepl.py: [tools.autorepl:221] Autosnap replication started
Aug 25 12:40:10 blunzn /autorepl.py: [tools.autorepl:221] Autosnap replication started
Aug 25 12:41:10 blunzn /autorepl.py: [tools.autorepl:221] Autosnap replication started


Of course this is not killing any CPU. The average load in FreeNAS 11 is higher than it used to be in 10.3.
 

FreeNASBob

Patron
Joined
Aug 23, 2014
Messages
226
All I can imagine is its running off the rails somehow.

Maybe try a bug report. Do it from the GUI so that you can attach a debug report.
How long should it take? It's been on "Generating debug info" for over 20 minutes. After 30 minutes the Web UI stopped responding. Attempts to load any other page just gave me "Sorry, an error occurred."
 
Last edited:

FreeNASBob

Patron
Joined
Aug 23, 2014
Messages
226
Of course this is not killing any CPU. The average load in FreeNAS 11 is higher than it used to be in 10.3.
Something is making me average between 50% and 85% for days on end with no connected users, no active jails or plugins, and no user-created cron jobs. Just idle. That's insane.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Something is making me average between 50% and 85% for days on end with no connected users, no active jails or plugins, and no user-created cron jobs. Just idle. That's insane.

What does the System Reporting -> Memory tab show?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
How long should it take? It's been on "Generating debug info" for over 20 minutes. After 30 minutes the Web UI stopped responding. Attempts to load any other page just gave me "Sorry, an error occurred."

This is probably a related issue... and you should report a bug ;)
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
How long should it take? It's been on "Generating debug info" for over 20 minutes. After 30 minutes the Web UI stopped responding. Attempts to load any other page just gave me "Sorry, an error occurred."
Becoming unresponsive like you describe, that would make me think about a hardware problem. Are you booting from a USB stick?

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

FreeNASBob

Patron
Joined
Aug 23, 2014
Messages
226
I think it's an issue with my SATADOM. I'm now getting a ton of errors on scrub. The only thing stored on it is the system dataset. I'm surprised this thing has failed after only 12,000 hours powered on and 50 power cycles. I guess I'll use an SSD instead.

Scrub of boot shows 0 errors so far, but it is a USB.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
What are you using the SATADOM for if not booting?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I think it's an issue with my SATADOM. I'm now getting a ton of errors on scrub. The only thing stored on it is the system dataset. I'm surprised this thing has failed after only 12,000 hours powered on and 50 power cycles. I guess I'll use an SSD instead.

Scrub of boot shows 0 errors so far, but it is a USB.
If you are getting errors on a scrub, it is absolutely a problem. if possible, move the system dataset to other storage.
That might clear the problem.
Let us know.

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 

FreeNASBob

Patron
Joined
Aug 23, 2014
Messages
226
What are you using the SATADOM for if not booting?
System dataset. I don't want to keep it on a spinning platter because it prevents the drives from spinning down, and my goal is to minimize power consumption. The NAS is used infrequently and a constantly spinning drive is a waste 23.5 hours a day.
 
Status
Not open for further replies.
Top