Scale 22.02.0 crashing due to OOM?

Che^ron

Cadet
Joined
Jun 12, 2014
Messages
6
Have had three instances in the last week where Scale became unresponsive and I had to hard reset to bring it back. /var/log/messages seems to indicate a memory issue but I'm not sure where the instance is occuring. I can see oom-killer working its way through every system process until eventually nothing is running. Seems to be a snowball effect of sorts.

I have 32GB of memory and am running multiple docker containers.

Has anyone come across anything similar?

/var/log/messages sample -- https://pastebin.com/S501JG11
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Can you itemize the containers and provide some stats on the RAM usage during normal operation....
 

Che^ron

Cadet
Joined
Jun 12, 2014
Messages
6
Good morning morganL,

The usual home media stuff here, RAM usage has been fine since installing scale approx 8 weeks ago and this problem seems to have cropped up within just the last week for some reason - cannot (yet) think of anything that has changed over that timeline.

Running containers are:
  1. Deluge
  2. Radarr
  3. Sonarr
  4. Plex
  5. Prowlarr
  6. Flaresolver
  7. Ubiquiti Unifi controller
  8. AutoBRR
Didn't make it 12h just now. Checked in on server this morning and unable to SSH into Scale or access the webUI. All containers down. The system is killing all containers and all system services trying to regain memory. Logs show it starts off slowly attempting to make room with oom-killer and then the logs are full of repeated attempts at much shorter intervals then initially.

Memory/swap usage graph:

What steps can I take to discover what's happening here? I've tried sifting through logs but have discovered no smoking gun perse, but happy to upload logs anywhere to be viewed (though my /var/log/messages is too large for pastebin it seems).

Thanks in advance!
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Looks like a memory leak..... or an application that is consuming RAM?

I think the command you need is something like:

k3s kubectl top pod

Or turn one application on at a time...
 

Che^ron

Cadet
Joined
Jun 12, 2014
Messages
6
Hey I appreciate you taking the time to reply. Trying to run that I get:
W0314 11:53:18.195535 609462 top_pod.go:140] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag
error: Metrics API not available
 

warllo

Contributor
Joined
Nov 22, 2012
Messages
117
I have been experiencing the very same symptoms as Che^ron. Every few days or weeks the box will hard lock and need to be reset. I have 32gb of Ram. The only container I have in common with the op is Plex. I will check for similar error logs tonight when I get home from work but I suspect similar results.

When running k3s kubectl top pod I also get the error below.

W0314 11:53:18.195535 609462 top_pod.go:140] Using json format to get metrics. Next release will switch to protocol-buffers, switch early by passing --use-protocol-buffers flag error: Metrics API not available
 

Whiskydrinker

Dabbler
Joined
Mar 15, 2022
Messages
17
Has anyone come across anything similar?
Kind of, as I have postet here. I'm running basic file services, no kubernetes and one virtual machine on my server. In my case the middlewared processes are hogging more and more memory over time until Scale runs out of memory.

So you could take a look if the middlewared also start hogging memory over time on your system.
 

LarsR

Guru
Joined
Oct 23, 2020
Messages
719
Every few days or weeks the box will hard lock and need to be reset
@warllo do you, by any chance, use a ryzen cpu? that behavior is common for 1. and 2. Gen ryzen cpu's without bios tweeks.
 

Kasazn

Explorer
Joined
Apr 17, 2021
Messages
60
Just to add information on my end, who knows it will help.

I have 3 apps running:
- Deluge
- PiHole
- jDownloader2

No issues so far.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
TruenAS SCALE uses a docker backend for Kubernetes. So, the command to use is:

docker stats


This is an example from an engineering test system

docker-stats.png
 

Che^ron

Cadet
Joined
Jun 12, 2014
Messages
6
Didn't find out what was causing this, but reinstalling fresh and restoring my config appears to have resolved the issue for anyone else that runs into it.

Cheers.
 
Top