FreeNAS locked at high CPU usage while idle

Fraoch · Jul 22, 2015

This is weird - take a look at this top output:

Code:

last pid: 15031;  load averages:  2.25,  2.17,  2.13  up 0+09:14:22  08:39:59
83 processes:  3 running, 80 sleeping
CPU: 51.0% user,  0.0% nice,  1.1% system,  0.0% interrupt, 47.8% idle
Mem: 170M Active, 503M Inact, 13G Wired, 2600K Cache, 1766M Free
ARC: 12G Total, 7561M MFU, 3928M MRU, 672K Anon, 62M Header, 667M Other
Swap: 8192M Total, 284K Used, 8191M Free

  PID USERNAME  THR PRI NICE  SIZE  RES STATE  C  TIME  WCPU COMMAND
12989 root  1 103  0 16116K  8416K CPU0  0 515:50 100.00% bzip2
49184 root  1 103  0 85372K 11532K CPU1  1 334:17 100.00% python2.7
15026 root  1  21  0 67400K 20932K uwait  1  0:00  0.49% dtrace
 3313 root  12  20  0  177M 11636K uwait  1  0:58  0.00% collectd
 2212 root  1  20  0 56676K  4292K select  2  0:51  0.00% snmpd
 2208 root  6  25  0 99316K  7812K select  2  0:46  0.00% python2.7
 2238 root  1  20  0 37584K  2260K nanslp  3  0:15  0.00% zpool
 2102 root  4  20  0  9908K  1516K rpcsvc  0  0:11  0.00% nfsd
 3140 root  1  52  0  198M 36036K select  0  0:10  0.00% python2.7
 3136 root  6  20  0  386M  107M usem  1  0:07  0.00% python2.7
 5340  30001  1  20  0  223M 63980K select  3  0:06  0.00% perl5.18.
 2526 root  1  20  0 12044K  1472K select  3  0:02  0.00% powerd
 1895 root  1  20  0 90328K  5464K kqread  1  0:01  0.00% syslog-ng
 5999 root  1  52  0  171M 12452K ttyin  2  0:01  0.00% python2.7
 3595 root  4  52  0  176M 11900K select  2  0:01  0.00% python2.7

bzip2 and python 2.7 are taking up about 50% of my CPU, and my FreeNAS is idle.

The reporting tab shows it's been this way for at least 24 hours.

This is after a reboot. I am tuning my new Sophos UTM firewall and it somehow cut this FreeNAS off the network (still trying to figure that out) so I rebooted the FreeNAS server. I had gone onto it to check the log files and see what happened when I find this.

The log file shows nothing unusual:

Code:

Jul 22 00:00:00 Minas-Tirith newsyslog[12900]: logfile turned over due to size>100K
Jul 22 00:00:00 Minas-Tirith syslog-ng[1895]: Configuration reload request received, reloading configuration;
Jul 22 00:30:08 Minas-Tirith afpd[18750]: Login by nobody (AFP3.4)
Jul 22 00:30:08 Minas-Tirith afpd[18750]: afp_zzz: entering extended sleep
Jul 22 00:41:12 Minas-Tirith afpd[18750]: AFP logout by nobody
Jul 22 00:41:12 Minas-Tirith afpd[18750]: AFP statistics: 283030.10 KB read, 204484.96 KB written
Jul 22 00:41:12 Minas-Tirith afpd[18750]: done
Jul 22 08:00:04 Minas-Tirith autosnap.py: [tools.autosnap:70] Popen()ing: /sbin/zfs snapshot "volume1@auto-20150722.0800-1m"
Jul 22 08:00:04 Minas-Tirith autosnap.py: [tools.autosnap:70] Popen()ing: /sbin/zfs snapshot "volume1/Aileens_Storage@auto-20150722.0800-2d"
Jul 22 08:00:04 Minas-Tirith autosnap.py: [tools.autosnap:70] Popen()ing: /sbin/zfs snapshot "volume1@auto-20150722.0800-2d"
Jul 22 08:00:05 Minas-Tirith autosnap.py: [tools.autosnap:70] Popen()ing: /sbin/zfs get -H freenas:state "volume1/Aileens_Storage@auto-20150720.0800-2d"
Jul 22 08:00:05 Minas-Tirith autosnap.py: [tools.autosnap:70] Popen()ing: /sbin/zfs destroy -r -d "volume1/Aileens_Storage@auto-20150720.0800-2d"
Jul 22 08:00:05 Minas-Tirith autosnap.py: [tools.autosnap:70] Popen()ing: /sbin/zfs get -H freenas:state "volume1@auto-20150720.0800-2d"
Jul 22 08:00:05 Minas-Tirith autosnap.py: [tools.autosnap:70] Popen()ing: /sbin/zfs destroy -r -d "volume1@auto-20150720.0800-2d"
Jul 22 08:00:05 Minas-Tirith autosnap.py: [tools.autosnap:70] Popen()ing: /sbin/zfs get -H freenas:state "volume1@auto-20150622.0800-1m"
Jul 22 08:00:05 Minas-Tirith autosnap.py: [tools.autosnap:70] Popen()ing: /sbin/zfs destroy -r -d "volume1@auto-20150622.0800-1m"

For a second I thought it might be turning over the old logfile, but it has been locked in a state of high CPU usage way before this time.

FreeNAS dropping off the network and this high CPU usage seem related. Yesterday morning it spontaneously jumped to 80% CPU and stayed there all day until I found it was no longer on the network and rebooted it. It resumed at normal idle CPU, then jumped up again at the first file transfer, staying at 30% CPU for about 3 hours, then jumping to 50% CPU where it's staying. However this first file transfer occurred at the exact same time it was turning over the log file...

Could it be compressing the log and hanging? In trying to see older logfiles I seemed to crash nano when I tried to view /var/log/messages.0. I can't open it. Wouldn't that be the older logfile bzip2 is trying to compress?

Any suggestions?

dlavigne · Jul 22, 2015

Is bzip still running? If so, what is the output of procstat -c 12989 (or whatever the current PID is)?

Fraoch · Jul 22, 2015

Thanks for the response. It didn't look like it was resolving itself so rather than just killing the process I rebooted the server. It's back to normal, but if this is a problem compressing the log, it will reappear at midnight. I'll post the output tomorrow morning if this is the case.

Fraoch · Jul 24, 2015

I haven't had the bzip2 process reappear, but that hanging python process keeps popping up and taking up 100% of one of my CPU cores.

I suspect it might be due to network issues. I've been having network problems lately - first configuring an aggressive firewall that knocked FreeNAS off the network, then an Internet outage. I've resolved both of these but I think I see how the hanging process started in the log - the timing coincides with a period of high, sustained CPU usage:

Code:

Jul 24 01:32:08 Minas-Tirith update_check.py: [freenasOS.Configuration:567] Unable to load http://update.freenas.org/FreeNAS/trains.txt: <urlopen error [Errno 8] hostname nor servname provided, or not known>
Jul 24 01:32:18 Minas-Tirith update_check.py: [freenasOS.Configuration:567] Unable to load http://update.freenas.org/FreeNAS/FreeNAS-9.3-STABLE/LATEST: <urlopen error [Errno 8] hostname nor servname provided, or not known>
Jul 24 01:32:18 Minas-Tirith update_check.py: [freenasOS.Update:664] Unable to find latest manifest for train FreeNAS-9.3-STABLE
Jul 24 01:32:28 Minas-Tirith update_check.py: [freenasOS.Configuration:567] Unable to load http://update.freenas.org/FreeNAS/trains.txt: <urlopen error [Errno 8] hostname nor servname provided, or not known>
Jul 24 01:36:36 Minas-Tirith ntpd_initres[6146]: host name not found: 0.freebsd.pool.ntp.org
Jul 24 01:36:46 Minas-Tirith ntpd_initres[6146]: host name not found: 1.freebsd.pool.ntp.org
Jul 24 01:36:56 Minas-Tirith ntpd_initres[6146]: host name not found: 2.freebsd.pool.ntp.org
Jul 24 01:53:06 Minas-Tirith ntpd_initres[6146]: host name not found: 0.freebsd.pool.ntp.org
Jul 24 01:53:16 Minas-Tirith ntpd_initres[6146]: host name not found: 1.freebsd.pool.ntp.org
Jul 24 01:53:26 Minas-Tirith ntpd_initres[6146]: host name not found: 2.freebsd.pool.ntp.org
Jul 24 02:09:36 Minas-Tirith ntpd_initres[6146]: host name not found: 0.freebsd.pool.ntp.org
Jul 24 02:09:46 Minas-Tirith ntpd_initres[6146]: host name not found: 1.freebsd.pool.ntp.org
Jul 24 02:09:56 Minas-Tirith ntpd_initres[6146]: host name not found: 2.freebsd.pool.ntp.org
Jul 24 02:15:09 Minas-Tirith freenas[39129]: Traceback (most recent call last):
Jul 24 02:15:09 Minas-Tirith freenas[39129]:  File "/usr/local/www/freenasUI/common/system.py", line 182, in send_mail
Jul 24 02:15:09 Minas-Tirith freenas[39129]:  local_hostname=local_hostname)
Jul 24 02:15:09 Minas-Tirith freenas[39129]:  File "/usr/local/lib/python2.7/smtplib.py", line 781, in __init__
Jul 24 02:15:09 Minas-Tirith freenas[39129]:  SMTP.__init__(self, host, port, local_hostname, timeout)
Jul 24 02:15:09 Minas-Tirith freenas[39129]:  File "/usr/local/lib/python2.7/smtplib.py", line 251, in __init__
Jul 24 02:15:09 Minas-Tirith freenas[39129]:  (code, msg) = self.connect(host, port)
Jul 24 02:15:09 Minas-Tirith freenas[39129]:  File "/usr/local/lib/python2.7/smtplib.py", line 311, in connect
Jul 24 02:15:09 Minas-Tirith freenas[39129]:  self.sock = self._get_socket(host, port, self.timeout)
Jul 24 02:15:09 Minas-Tirith freenas[39129]:  File "/usr/local/lib/python2.7/smtplib.py", line 786, in _get_socket
Jul 24 02:15:09 Minas-Tirith freenas[39129]:  new_socket = socket.create_connection((host, port), timeout)
Jul 24 02:15:09 Minas-Tirith freenas[39129]:  File "/usr/local/lib/python2.7/socket.py", line 553, in create_connection
Jul 24 02:15:09 Minas-Tirith freenas[39129]:  for res in getaddrinfo(host, port, 0, SOCK_STREAM):
Jul 24 02:15:09 Minas-Tirith freenas[39129]: gaierror: [Errno 8] hostname nor servname provided, or not known
Jul 24 02:26:06 Minas-Tirith ntpd_initres[6146]: host name not found: 0.freebsd.pool.ntp.org
Jul 24 02:26:16 Minas-Tirith ntpd_initres[6146]: host name not found: 1.freebsd.pool.ntp.org
Jul 24 02:26:26 Minas-Tirith ntpd_initres[6146]: host name not found: 2.freebsd.pool.ntp.org

The high CPU usage coincided with the time of the traceback.

Is this a bug? If FreeNAS can't contact the NTP pool it should just fail gracefully.

SweetAndLow · Jul 24, 2015

What is your hardware, any jails and how are things configured on your datasets?

Fraoch · Jul 25, 2015

Hardware is in my signature, but if you missed it:

FreeNAS 9.3
Supermicro X10SLM-F with Intel Xeon E3-1220 v3 and 2 X 8 GB Crucial 1.35 V DDR3-1600 ECC
4 X Western Digital Red 2 TB WD20EFRX in RAID-Z2, FreeNAS on 64 GB SanDisk SSD + 64 GB Crucial M4 SSD (mirror)
SeaSonic SS-660 XP2
APC Smart-UPS C 1500
Phanteks Enthoo Pro with StarTech HSB430SATBK 4-bay hotswap backplane

I do have two jails, only one is running. The jail contains Logitech Media Server, configured like so:

https://forums.freenas.org/index.ph...build-and-install-lms-7-9-from-scratch.27161/

One of the jails was for compiling and installation, the other one actually runs the server. It's a very well-behaved, low-resource server. It's running fine on my FreeNAS.

Datasets are simple, configured like so:

That python process is still hung. I'm rebooting the server now that the network is back to normal, I'll see if it pops up again.

Fraoch · Jul 27, 2015

Well I rebooted with the network back up and the problem hasn't reappeared.

I strongly suspect it's related to the outside network being down, and in fact I was thinking that maybe bzip2 was having problems trying to compress log files with ntpd entries once every 10 seconds.

However I don't think I have enough evidence to file a bug.

jimnastics · Mar 4, 2016

Sorry for a bit of a necro-bump but I'm experiencing exactly the same problem, 100% CPU usage from python2.7 and bzip2 processes. Fraoch, did you get to the bottom of this? I will post more details from my side over the weekend.

Fraoch · Mar 4, 2016

It seems to occur if FreeNAS starts up without network access (i.e. a cable is unplugged or you have a misconfigured firewall blocking access). My FreeNAS and my network have been so stable lately I haven't seen it in a while so I don't know if it still occurs with recent builds.

Important Announcement for the TrueNAS Community.

FreeNAS locked at high CPU usage while idle

Fraoch

Patron

dlavigne

Guest

Fraoch

Patron

Fraoch

Patron

SweetAndLow

Sweet'NASty

Fraoch

Patron

Fraoch

Patron

jimnastics

Cadet

Fraoch

Patron

Similar threads