Middlewared constantly stops

cflnet

Cadet
Joined
Aug 14, 2022
Messages
9
Good Morning Everyone,
I am having an issue with a new TrueNAS Core setup. It seemed to be running fine for a few weeks but now the middlewared service keeps failing. The file shares are still working when it does but I am unable to access the web portal. I am able to SSH into the unit and do a restart of the service to get it running again but I need to know the cause of the issue so I can stop this from happening.

PC Specs:
AMD Ryzen 7 5700G
16GB RAM
3 Samsung 1TB SSDs (1 for OS, 2 for data but not in a mirror as of yet. The second drive was just added and we haven't had an opportunity to to schedule a downtime yet to swap it to a mirrored mode)

There is also an hourly cloud sync task for BackBlaze.

I originally created a Cron task to run an rsync command in order to copy all data from the production drive to the spare. This was just to give a local backup until the mirror was able to be set up. However, that seemed to make things worse and the middlewared crash seemed to happen more often.

Any suggestions/help would be much appreciated!
 

homer27081990

Patron
Joined
Aug 9, 2022
Messages
321
Log in to SSH, type
less /var/log/messages
ifconfig
sysctl -a | less
and post the output. Careful about sensitive data, replace with Xs
 

cflnet

Cadet
Joined
Aug 14, 2022
Messages
9
messages log...
Aug 14 00:00:00 truenas newsyslog[7896]: logfile turned over due to size>200K
Aug 14 00:00:00 truenas syslog-ng[3113]: Configuration reload request received, reloading configuration;
Aug 14 00:00:00 truenas syslog-ng[3113]: Configuration reload finished;
Aug 14 22:10:05 truenas swap_pager: out of swap space
Aug 14 22:10:05 truenas swp_pager_getswapspace(19): failed
Aug 14 22:10:05 truenas kernel: pid 14479 (rclone), jid 0, uid 0, was killed: failed to reclaim memory
Aug 14 22:10:05 truenas kernel[3113]: Last message 'pid 14479 (rclone), ' repeated 1 times, suppressed by syslog-ng on truenas.local
Aug 14 22:10:05 truenas kernel: pid 6945 (python3.9), jid 0, uid 0, was killed: failed to reclaim memory
Aug 14 22:11:05 truenas kernel[3113]: Last message 'pid 6945 (python3.9)' repeated 1 times, suppressed by syslog-ng on truenas.local
Aug 14 22:14:06 truenas 1 2022-08-14T22:14:06.166902-07:00 truenas.local collectd 1664 - - Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
with Client() as c:
File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 281, in __init__
self._ws.connect()
File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.9/site-packages/ws4py/client/__init__.py", line 216, in connect
self.sock.connect(self.bind_addr)
ConnectionRefusedError: [Errno 61] Connection refused
Aug 14 22:19:06 truenas 1 2022-08-14T22:19:06.151873-07:00 truenas.local collectd 1664 - - Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
with Client() as c:
File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 281, in __init__
self._ws.connect()
File "/usr/local/lib/python3.9/site-packages/middlewared/client/client.py", line 124, in connect
rv = super(WSClient, self).connect()
File "/usr/local/lib/python3.9/site-packages/ws4py/client/__init__.py", line 216, in connect
self.sock.connect(self.bind_addr)
ConnectionRefusedError: [Errno 61] Connection refused
Aug 14 22:24:06 truenas 1 2022-08-14T22:24:06.157553-07:00 truenas.local collectd 1664 - - Traceback (most recent call last):
File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
with Client() as c:

ifconfig

re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
description: LAN1
options=201b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,WOL_MAGIC>
ether c0:18:03:b6:bb:5d
inet 192.168.0.112 netmask 0xffffff00 broadcast 192.168.0.255
media: Ethernet autoselect (100baseTX <full-duplex>)
status: active
nd6 options=9<PERFORMNUD,IFDISABLED>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=680003<RXCSUM,TXCSUM,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x2
inet 127.0.0.1 netmask 0xff000000
groups: lo
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
pflog0: flags=0<> metric 0 mtu 33160
groups: pflog

sysctl

kern.ostype: FreeBSD
kern.osrelease: 13.1-RELEASE
kern.osrevision: 199506
kern.version: FreeBSD 13.1-RELEASE n245376-eba770b30ff TRUENAS

kern.maxvnodes: 338451
kern.maxproc: 21044
kern.maxfiles: 501190
kern.argmax: 524288
kern.securelevel: -1
kern.hostname: truenas.local
kern.hostid: 1581754328
kern.clockrate: { hz = 1000, tick = 1000, profhz = 8128, stathz = 127 }
kern.posix1version: 200112
kern.ngroups: 1023
kern.job_control: 1
kern.saved_ids: 0
kern.boottime: { sec = 1660434218, usec = 227545 } Sat Aug 13 16:43:38 2022
kern.domainname:
kern.osreldate: 1301000
kern.bootfile: /boot/kernel/kernel
kern.maxfilesperproc: 451071
kern.maxprocperuid: 18939
 

homer27081990

Patron
Joined
Aug 9, 2022
Messages
321
Your RAM seems full and your swap partition, too... TrueNAS requires RAM that is analogous to the storage space handled by ZFS. Loading more services on top of that, or letting a jail get too much RAM, or giving too much RAM to a VM will halt your system. If you had a bigger swap partition, which is the source of the problem, it still wouldn't fix the underlying problem: either remove services, jails or VMs, or check your RAM. Sidenote: if this is a Ryzen with integrated graphics, those steal RAM too. Put a whatever, 2000's GPU on and deactivate the AMD on chip GPU, at least until you get more RAM. Please provide a screenshot of the output of the top, sysctl -a | less and swapinfo -k commands, so we can see the RAM usage.
 
Last edited:

cflnet

Cadet
Joined
Aug 14, 2022
Messages
9
That is pretty odd considering there are no VMs or Jails. The system is solely acting as a file share and backing up to BackBlaze. The only services that are turned on are SMART, SMB and SSH.

Results of top sysctl -a

top sysctl -a | less
last pid: 30986; load averages: 0.10, 0.19, 0.18 up 1+21:14:33 13:58:11
62 processes: 1 running, 61 sleeping
CPU: 0.3% user, 0.0% nice, 0.1% system, 0.0% interrupt, 99.5% idle
Mem: 319M Active, 3622M Inact, 1537M Laundry, 7880M Wired, 424M Free
ARC: 6013M Total, 4611M MFU, 512M MRU, 256K Anon, 40M Header, 849M Other
4492M Compressed, 6462M Uncompressed, 1.44:1 Ratio
Swap: 2048M Total, 621M Used, 1426M Free, 30% Inuse

swapinfo -k

Device 1K-blocks Used Avail Capacity
/dev/mirror/swap0.eli 2097152 636364 1460788 30%

I believe it is and integrated GPU. How would I go about deactivating it.

However, I feel that would be a workaround. There seems to be some kind of memory leak since nothing else is running, would it not?
 

cflnet

Cadet
Joined
Aug 14, 2022
Messages
9
Also, if I run just top to see what is running, rclone seems to be running pretty high. I am assuming that is the cloud sync service to BackBlaze that is currently running?

26105 root 211 20 0 7362M 6118M uwait 4 13:58 3.20% rclone
 

homer27081990

Patron
Joined
Aug 9, 2022
Messages
321
Also, if I run just top to see what is running, rclone seems to be running pretty high. I am assuming that is the cloud sync service to BackBlaze that is currently running?
It may be an intermittent RAM issue, or overheating, or a failing PSU... You are right. Run memtest86+ on it, check temps at load, try a new or more powerfull PSU... Also, for when this is over (I wish that for you), you can deactivate all graphics from BIOS (CPU options, integrated graphics). But it is easier (and more productive) to just throw in a nothing card that just has even a little VRAM.
 

homer27081990

Patron
Joined
Aug 9, 2022
Messages
321
Also, if I run just top to see what is running, rclone seems to be running pretty high. I am assuming that is the cloud sync service to BackBlaze that is currently running?
Also, could you just monitor it (every half-hour) and see if the RAM readings change? It could be a gradual issue.
 

cflnet

Cadet
Joined
Aug 14, 2022
Messages
9
We'll have to schedule a time to take it down to test the memory but I am skeptical that it is a hardware issue. It is a new PC and I am pretty sure it is spec'd properly in terms of memory and PSU. However, I know new stuff is bad off the shelf sometimes as well.

The memory only spikes while the BackBlaze cloud sync is happening. I have observed it go from 10GB free to 0.4GB free when the sync starts. It there some sort of setting we may have missed when setting up the sync that would cause this?
 

homer27081990

Patron
Joined
Aug 9, 2022
Messages
321
We'll have to schedule a time to take it down to test the memory but I am skeptical that it is a hardware issue. It is a new PC and I am pretty sure it is spec'd properly in terms of memory and PSU. However, I know new stuff is bad off the shelf sometimes as well.

The memory only spikes while the BackBlaze cloud sync is happening. I have observed it go from 10GB free to 0.4GB free when the sync starts. It there some sort of setting we may have missed when setting up the sync that would cause this?
Probably some kind of bug or misconfiguration. Some loop, maybe? Loading too much in memory? We need to take a look at the BackBlaze logs, if there are any.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Have you applied the known Ryzen fixes for stability?
  • Disable Cool-n-Quiet
  • Disable C6 states
 

cflnet

Cadet
Joined
Aug 14, 2022
Messages
9
Sorry I know it has been a while, I forgot to update this. It turns out the issue was the BackBlaze sync task. We removed it and everything seemed to work properly. Then later it was set back up and now continues to work with no issues. For some reason that task was getting stuck and maxing out the RAM causing the service to stop.
 
Top