11.1 Slow after a couple days. Reverted to 11.0 u4

majerus · Dec 20, 2017

I tried to VNC into the single virtual machine I have running on FreeNAS and was unable to do anything. Once that occurred I figured the virtual machine locked up as I have been seeing weird issues with it. However I then tried to login to the WEBUI for FreeNAS and was also unable to do that. Finally I was able to get in via SSH and saw my load average was 13, 20, 21.

The main processes that were running were Python 3.6 which you can see below these process seem to correspond to middleware. I was able to check netdata and all of this seems to have started at 7:25pm on 12/19 is there anyway I can track what is happening, what occured at 7:25 etc?

Thanks!

Code:

last pid: 73053;  load averages:  1.79,  3.48,  9.66																		up 4+22:02:09  08:15:58
106 processes: 1 running, 104 sleeping, 1 zombie
CPU:  4.5% user,  0.1% nice, 15.7% system,  0.1% interrupt, 79.6% idle
Mem: 290M Active, 117M Inact, 60M Laundry, 15G Wired, 158M Free
ARC: 1580M Total, 808M MFU, 464M MRU, 6176K Anon, 24M Header, 278M Other
	 976M Compressed, 2427M Uncompressed, 2.49:1 Ratio
Swap: 8192M Total, 4189M Used, 4003M Free, 51% Inuse, 4140K In, 5196K Out

  PID USERNAME	THR PRI NICE   SIZE	RES STATE   C   TIME	WCPU COMMAND
  245 root		946  20	0  1053M 78096K uwait   3 340:49 108.97% python3.6
84344 root		 38  20	0  2121M 39020K kqread  0  37.2H  43.74% bhyve
54029 netdata	  16  52   19   474M 17604K pause   3  25:07   0.66% netdata

Code:

 ps -ax |grep "[p]ython"
  245  -  S	  341:17.81 python3.6: middlewared (python3.6)
 2883  -  S	   94:45.80 /usr/local/bin/python /usr/local/bin/snmp-agent.py (python3.6)
 3537  -  S		4:30.83 python: alertd (python3.6)
 5610  -  SJ	   0:31.62 /usr/pbi/plexmediaserver-amd64/bin/python2.7 /usr/pbi/plexmediaserver-amd64/control.py start 192.168.1.181 12346
80975  -  S		1:13.30 python: webshelld (python3.6)
 6776 v0  IWs	  0:00.00 /usr/local/bin/python /etc/netcli (python3.6)

Netdata

TOP

After all this I was also seeing errors, which I think might be from Netdata trying to collect stats but not sure.

Dec 20 08:26:51 Vault collectd[66204]: statvfs(/mnt/DeepEnd/VMBackups/Ludicrous) failed: No such file or directory
Dec 20 08:27:01 Vault collectd[66204]: statvfs(/mnt/DeepEnd/VMBackups/Ludicrous) failed: No such file or directory

Newest Load AVG..

Code:

last pid: 97303;  load averages: 44.43, 30.93, 24.60																		up 4+22:33:10  08:46:59
105 processes: 1 running, 103 sleeping, 1 zombie
CPU:  5.1% user,  0.1% nice, 18.7% system,  0.0% interrupt, 76.1% idle
Mem: 189M Active, 219M Inact, 77M Laundry, 15G Wired, 139M Free
ARC: 1582M Total, 791M MFU, 488M MRU, 740K Anon, 24M Header, 278M Other
	 976M Compressed, 2451M Uncompressed, 2.51:1 Ratio
Swap: 8192M Total, 4242M Used, 3950M Free, 51% Inuse, 5456K In, 23M Out

  PID USERNAME	THR PRI NICE   SIZE	RES STATE   C   TIME	WCPU COMMAND
  245 root		946  20	0  1053M 56308K uwait   7 372:02 148.96% python3.6
84344 root		 38  20	0  2121M 34592K kqread  4  37.4H  17.37% bhyve
54029 netdata	  16  52   19   474M 17616K pause   3  25:20   0.69% netdata
 2394 root		  4  20	0  6252K  1796K rpcsvc  4  19:50   0.40% nfsd
51900 netdata	   1  39   19  8844K  3016K nanslp  6   0:33   0.28% apps.plugin
 2883 root		  5  22	0 53492K 14944K select  5  95:10   0.26% python3.6
 3608 root		 16  44	0 56660K 12332K uwait   7  11:21   0.14% consul
 4828 root		 24  20	0   264M 30548K kqread  4   3:53   0.13% uwsgi
52150 netdata	   1  52   19  8100K  2944K piperd  7   0:16   0.12% bash
97206 root		  1  20	0  7948K  4156K CPU4	4   0:00   0.10% top
 3488 nobody		1  20	0  7144K  2360K select  5   0:12   0.07% mdnsd
80975 root		  2  20	0 40132K  6584K select  6   1:14   0.02% python3.6

Code:

l
last pid:  3232;  load averages: 17.99, 34.68, 35.82																		up 5+02:56:33  13:10:22
117 processes: 1 running, 115 sleeping, 1 zombie
CPU:  6.1% user,  0.0% nice, 20.5% system,  0.2% interrupt, 73.1% idle
Mem: 154M Active, 3112K Inact, 18M Laundry, 15G Wired, 177M Free
ARC: 1530M Total, 774M MFU, 450M MRU, 3588K Anon, 22M Header, 280M Other
	 931M Compressed, 2422M Uncompressed, 2.60:1 Ratio
Swap: 8192M Total, 4355M Used, 3836M Free, 53% Inuse, 7004K In, 2420K Out

  PID USERNAME	THR PRI NICE   SIZE	RES STATE   C   TIME	WCPU COMMAND
  245 root	   1075  20	0  1070M 35540K uwait   5 717:32 146.66% python3.6
84344 root		 38  20	0  2121M 20500K kqread  0  38.4H  13.38% bhyve
 2879 root		  1  20	0   675M 10792K swread  3   8:19   3.56% snmpd
66204 root		 12  20	0 97912K  7428K nanslp  1   8:19   1.01% collectd
54029 netdata	  11  52   19   477M 17352K pause   3  27:10   0.73% netdata
 6843 root		  5  20	0 31976K  2352K kqread  1   1:08   0.72% syslog-ng
 4828 root		 28  20	0   269M 12204K uwait   1   4:21   0.37% uwsgi
42654 netdata	   1  39   19  8844K  3032K swread  4   0:40   0.32% apps.plugin
 2394 root		  4  20	0  6252K  1796K zio->i  5  20:47   0.26% nfsd
 2883 root		  5  24	0 53492K 14024K select  5  98:57   0.26% python3.6
 3608 root		 16  43	0 56660K  8132K uwait   1  12:20   0.20% consul
 2142 root		  1 -52   r0  3520K  3584K nanslp  3   4:06   0.15% watchdogd
42821 netdata	   1  52   19  8100K  2924K wait	3   0:17   0.11% bash
 2574 root		  1  20	0  7948K  3304K CPU6	6   0:00   0.07% top
 3488 nobody		1  20	0  7144K  2216K swread  1   0:14   0.05% mdnsd
 3537 root		  7  20	0   119M  9308K swread  2   4:42   0.03% python3.6
  483 root		  1  20	0   172M  7816K swread  2   0:00   0.03% smbd
80975 root		  2  20	0 40132K  6536K select  3   1:18   0.03% python3.6
 3196 root		  1  21	0 59324K 26148K uwait   6   0:00   0.03% dtrace
 1777 root		  1  20	0  9172K	96K select  7   0:01   0.03% devd
72969 root		  1  20	0 13216K  4756K select  4   0:00   0.02% sshd
 3209 root		  1  22	0 59324K 26148K uwait   6   0:00   0.01% dtrace
 5610 root		  6  20	0   149M  1000K select  1   0:33   0.01% python2.7
 2475 root		  1  20	0  6340K  1888K select  2   0:42   0.01% powerd
 2578 uucp		  1  20	0 18896K  2168K select  3   0:38   0.01% upsd
72957 root		  1  20	0 13216K  4784K select  6   0:01   0.01% sshd

majerus · Dec 20, 2017

Error when it loaded..

majerus · Dec 20, 2017

It looks to me like the Middlewared python that is running has a ton of threads. I compared my test 11.2 system vs this broken 11.1 system below are the results. 11.1 has 1076 threads, and 11.2 has 24

11.2 System (working)

11.1 System (broken)

Code:

Vault# procstat -t 245
  PID	TID COMM				TDNAME			  CPU  PRI STATE   WCHAN
  245 100683 python3.6		   -					-1  120 sleep   select
  245 100687 python3.6		   -					-1  120 sleep   select
  245 101193 python3.6		   -					-1  120 sleep   umtxn
  245 101200 python3.6		   -					-1  120 sleep   umtxn
  245 101204 python3.6		   -					-1  120 sleep   usem
  245 101205 python3.6		   -					-1  120 sleep   usem
  245 101206 python3.6		   -					-1  120 sleep   usem
  245 101207 python3.6		   -					-1  120 sleep   usem
  245 101208 python3.6		   -					-1  120 sleep   usem
  245 101209 python3.6		   -					-1  120 sleep   usem
  245 101210 python3.6		   -					-1  120 sleep   usem
  245 101211 python3.6		   -					-1  120 sleep   usem
  245 101213 python3.6		   -					-1  120 sleep   usem
  245 101214 python3.6		   -					-1  120 sleep   umtxn
  245 101216 python3.6		   -					-1  120 sleep   usem
  245 101219 python3.6		   -					-1  152 sleep   usem
  245 101348 python3.6		   -					-1  120 sleep   umtxn
  245 101353 python3.6		   -					-1  120 sleep   umtxn
  245 101356 python3.6		   -					-1  120 sleep   umtxn
  245 101388 python3.6		   -					-1  120 sleep   umtxn
  245 101405 python3.6		   -					-1  120 sleep   umtxn
  245 101426 python3.6		   -					-1  120 sleep   umtxn
  245 101438 python3.6		   -					-1  120 sleep   umtxn
  245 101449 python3.6		   -					-1  120 sleep   umtxn
  245 101474 python3.6		   -					-1  120 sleep   umtxn
  245 101475 python3.6		   -					-1  120 sleep   umtxn
  245 101476 python3.6		   -					-1  120 sleep   umtxn
  245 101557 python3.6		   -					-1  120 sleep   umtxn
  245 101583 python3.6		   -					-1  120 sleep   umtxn
  245 101587 python3.6		   -					-1  120 sleep   umtxn
  245 101841 python3.6		   -					-1  120 sleep   umtxn
  245 101844 python3.6		   -					-1  120 sleep   umtxn
  245 101858 python3.6		   -					-1  120 sleep   umtxn
  245 101859 python3.6		   -					-1  120 sleep   umtxn
  245 101860 python3.6		   -					-1  120 sleep   umtxn
  245 101862 python3.6		   -					-1  120 sleep   umtxn
  245 101863 python3.6		   -					-1  120 sleep   umtxn
  245 101864 python3.6		   -					-1  120 sleep   umtxn
  245 101865 python3.6		   -					-1  120 sleep   umtxn
  245 101870 python3.6		   -					-1  120 sleep   umtxn
  245 101871 python3.6		   -					-1  120 sleep   umtxn
  245 101872 python3.6		   -					-1  120 sleep   umtxn
  245 101874 python3.6		   -					-1  120 sleep   umtxn
  245 101875 python3.6		   -					-1  120 sleep   umtxn
  245 101876 python3.6		   -					-1  120 sleep   umtxn
  245 101878 python3.6		   -					-1  120 sleep   umtxn
  245 101879 python3.6		   -					-1  120 sleep   umtxn
  245 101880 python3.6		   -					-1  120 sleep   umtxn
  245 101881 python3.6		   -					-1  120 sleep   umtxn
  245 101882 python3.6		   -					-1  120 sleep   umtxn
  245 101884 python3.6		   -					-1  120 sleep   umtxn
  245 101886 python3.6		   -					-1  120 sleep   umtxn
  245 101887 python3.6		   -					-1  120 sleep   umtxn
  245 101888 python3.6		   -					-1  120 sleep   umtxn
  245 101890 python3.6		   -					-1  120 sleep   umtxn
  
I REMOVED 800 THREADS IN uxtxn Sleep STATE FROM THE OUTPUT SO I COULD FIT IT IN A POST
  
Vault# procstat -t 245 |wc -l
	1076

majerus · Dec 20, 2017

I rebooted the server its responding now.

Code:


last pid: 14507;  load averages:  0.40,  0.50,  0.28																		up 0+00:08:49  14:34:57
80 processes:  1 running, 79 sleeping
CPU:  3.0% user,  0.1% nice, 10.2% system,  0.1% interrupt, 86.7% idle
Mem: 3234M Active, 102M Inact, 6790M Wired, 5717M Free
ARC: 2087M Total, 500M MFU, 1534M MRU, 2368K Anon, 32M Header, 18M Other
	 1873M Compressed, 3240M Uncompressed, 1.73:1 Ratio
Swap: 8192M Total, 8192M Free

  PID USERNAME	THR PRI NICE   SIZE	RES STATE   C   TIME	WCPU COMMAND
 7031 root		 36  20	0  2112M  1503M kqread  2   2:11  83.06% bhyve
14490 root		  1  23	0 59324K 26272K uwait   1   0:00   5.15% dtrace
 3490 netdata	  10  52   19   422M   394M pause   6   0:02   0.36% netdata
 2888 root		  5  22	0 53236K 35820K select  4   0:07   0.34% python3.6
 2399 root		  4  20	0  6252K  2140K zilog-  3   0:02   0.30% nfsd
 3494 netdata	   1  39   19  8844K  3620K nanslp  4   0:01   0.21% apps.plu


Vault# ps -ax |grep [p]ython
  245  -  S	 0:10.77 python3.6: middlewared (python3.6)
 2888  -  S	 0:06.92 /usr/local/bin/python /usr/local/bin/snmp-agent.py (python3.6)
 3736  -  I	 0:04.07 python: alertd (python3.6)
 5802  -  SJ	0:01.20 /usr/pbi/plexmediaserver-amd64/bin/python2.7 /usr/pbi/plexmediaserver-amd64/control.py start 192.168.1.181 12346
 7019 v0  Is+   0:01.86 /usr/local/bin/python /etc/netcli (python3.6)
Vault# procstat -t 245
  PID	TID COMM				TDNAME			  CPU  PRI STATE   WCHAN
  245 101185 python3.6		   -					-1  120 sleep   kqread
  245 101199 python3.6		   -					-1  120 sleep   select
  245 101202 python3.6		   -					-1  120 sleep   usem
  245 101203 python3.6		   -					-1  120 sleep   usem
  245 101204 python3.6		   -					-1  120 sleep   usem
  245 101205 python3.6		   -					-1  120 sleep   usem
  245 101206 python3.6		   -					-1  120 sleep   usem
  245 101207 python3.6		   -					-1  120 sleep   usem
  245 101208 python3.6		   -					-1  120 sleep   usem
  245 101209 python3.6		   -					-1  120 sleep   usem
  245 101210 python3.6		   -					-1  120 sleep   usem
  245 101212 python3.6		   -					-1  120 sleep   usem
  245 101214 python3.6		   -					-1  120 sleep   usem
  245 101219 python3.6		   -					-1  152 sleep   usem
  245 101517 python3.6		   -					-1  120 sleep   select
  245 101520 python3.6		   -					-1  120 sleep   select
  245 101526 python3.6		   -					-1  120 sleep   select
  245 101530 python3.6		   -					-1  120 sleep   select
  245 102878 python3.6		   -					-1  120 sleep   select
  245 102880 python3.6		   -					-1  120 sleep   usem
Vault# procstat -t 245 |wc -l
	  21

majerus · Dec 26, 2017

Had to reboot the server last night as the virtual machine that was running was not functioning well at all, along with that the UI was extremely slow. I think I am going to revert later today.

majerus · Dec 26, 2017

Reverted to 11.0 U4 will see how it goes.

majerus · Jan 2, 2018

Since reverting everything has been good.

cancel · Jan 6, 2018

Seeing the same behavior in system load and middlewared. Went back to 11.0-U4 that worked ok for 3 months, until this one is fixed.

Opeio · Jan 11, 2018

I can confirm I'm seeing the same behavior on my homelab system running 11.1. I have not ran Freenas on this system yet so I'll try 11.0-U4 and report back in this post with specs and such.

majerus · Jan 11, 2018

Sorry to hear your having problems, think this is good feed back however that something is wrong with 11.1. I wish there was someone that would respond with steps to troubleshoot this issue.

Opeio · Jan 15, 2018

Just wanted to update I've been running u4 now for a few days. I haven't had the UI sluggishnes that I was experiencing before. Just the standard need to refresh every now and then on chrome. It responds quickly and plugins are working correctly without bogging down.

Specs for the machine are listed in the signature.

LIGISTX · May 3, 2018

majerus said:
Sorry to hear your having problems, think this is good feed back however that something is wrong with 11.1. I wish there was someone that would respond with steps to troubleshoot this issue.

I have posted a couple places in here about some weird things on 11.1, and while they may end up with answers shortly, it looks like I am having the same issues. I was wondering if you have had any success with 11.1U4? I was stable on 11.0 U4 for a while and decided to upgrade. I am so far not at all a fan.

I have SSH hangs, my webUI had the same time out issue, and I had some samba issues on my monitor plugged into the nas which frightened me enough to re-install and start fresh yesterday. Pool was fine, but this is definitely a concern... I don't know why this is happening, and this install has only been live for a little over 24 hours.

I didn't save a screenshot of the timeout in the webUI, but from memory I know mostly it looked the same, I am not sure what the Environment: part said, like I said at that point I was worried about data integrity and I shut it down until I could reformat the next day.

Any advice? I am currently looking at my processes and I don't see anything crazy. I have 102 processes, 1 running and 99 sleeping, 1 zombie. While this does seem a bit high, it isn't nearly as many as you had...

**********
Update:

Well, I decided I had to try and figure out wtf is going. Started turning off jails one at a time since I can't see it being freenas itself. To many people are on it for it to be this unstable. To me it felt like an IO issue or something. TOP didn't show any CPU load, not to many processes....
But, I did stop my syncthing service in its jail, as soon as I did, it looked to be fine. I am not sure if its syncthing itself, or there was a new option recently to allow for active "watching" for changes in the directories being monitored. I wonder if that is causing some abnormal behavior. In the same vane, I made sure plex isn't watching for changes either...

As of now, it looks to be working ok. But if anything goes sideways I will for sure post again.

Brad1976 · May 14, 2018

I wish I was so lucky... I managed to disable the file watcher in Syncthing but still had many, many hard manual shut downs... I ended up disabling Syncthing and haven't had it running since Saturday night.... I won't be able to get back to my server until Wednesday nighy. But because of your comment I be sure to check my Plex config as well!!! And then patiently troll and watch for a resolution... *sigh* lol

LIGISTX · May 15, 2018

Brad1976 said:
I wish I was so lucky... I managed to disable the file watcher in Syncthing but still had many, many hard manual shut downs... I ended up disabling Syncthing and haven't had it running since Saturday night.... I won't be able to get back to my server until Wednesday nighy. But because of your comment I be sure to check my Plex config as well!!! And then patiently troll and watch for a resolution... *sigh* lol

I have actually since turner my syncthing watched back on and the plex watcher still remaining odd, and everything looks to be working normally. Does this make sense, not really.... killing syncthing is what restored normal operation before. Maybe there is some weird IO conflict with multiple watchers? I am not sure, I have yet to try and diagnose the issue further. I am running into other issues and unfortunately not getting much help through the forums. I very well may end up back on 11.0 or even 9.x as I am having so many issues I never previously had.

Sent from my iPhone using Tapatalk

Brad1976 · May 23, 2018

LIGISTX said:
I have actually since turner my syncthing watched back on and the plex watcher still remaining odd, and everything looks to be working normally. Does this make sense, not really.... killing syncthing is what restored normal operation before. Maybe there is some weird IO conflict with multiple watchers? I am not sure, I have yet to try and diagnose the issue further. I am running into other issues and unfortunately not getting much help through the forums. I very well may end up back on 11.0 or even 9.x as I am having so many issues I never previously had.

Sent from my iPhone using Tapatalk

When I ended up turning syncthing bav kn last weekend everything was running fine again! I did however ensure that the "watcher" was unchecked in the advanced tab for all my devices including the freenas server itself... Which I knew what the cause was 100% though...

Stux · May 23, 2018

majerus said:
Swap: 8192M Total, 4355M Used

Hell of a lot of swap in use. It’s probably swapping like crazy.

LIGISTX · May 24, 2018

Stux said:
Hell of a lot of swap in use. It’s probably swapping like crazy.

Hmm. Good point. It is a weird issue and that sounds like a valid explanation. It’s strange, if I turn on one folder’s watch option it seems relatively fine, but if I turn a couple more on the entire box goes to an almost completely unusable state and it’s hard to even jexec in to stop the service.

Sent from my iPhone using Tapatalk

Stux · May 24, 2018

Say plex was writing something into the directory to mark it as unmodified... and say sync thing was too...

Round One.... Fight.

LIGISTX · May 24, 2018

Stux said:
Say plex was writing something into the directory to mark it as unmodified... and say sync thing was too...

Round One.... Fight.

Well, none of my files would be touched by both plex and syncthing.

Sent from my iPhone using Tapatalk

Important Announcement for the TrueNAS Community.

11.1 Slow after a couple days. Reverted to 11.0 u4

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Contributor

Cadet

Cadet

Contributor

Cadet

Guru

Dabbler

Guru

Dabbler

MVP

Guru

MVP

Guru

Similar threads