SOLVED TrueNAS hangs every saturday

Fidelity

Cadet
Joined
Apr 9, 2022
Messages
3
So I've got a very weird issue. Every saturday around 07:00, TrueNAS hangs. No SSH connection possible, and even no console connection (I virtualize using Proxmox). While it hangs it's clearly unresponsive in any way as well als CPU usage spikes to 100%.

Eventually when I notice, (this is usually around 10 or 11hr) I have to force stop the VM. Then start again. Everything works as expected, up until the next saturday morning.

I've checked the /var/log/messages but it's unclear to me what's going on and how to proceed further.

Saturday - April 2:
Code:
Apr  2 07:00:08 freenas proftpd[1076]: 127.0.0.1 - ProFTPD killed (signal 15)
Apr  2 07:00:08 freenas proftpd[1076]: 127.0.0.1 - ProFTPD 1.3.6b standalone mode SHUTDOWN
Apr  2 07:00:09 freenas 1 2022-04-02T07:00:09.112527+02:00 freenas.local ntpd 1051 - - ntpd exiting on signal 15 (Terminated)
Apr  2 07:00:09 freenas syslog-ng[845]: syslog-ng shutting down; version='3.29.1'
Apr  2 11:47:01 freenas syslog-ng[826]: syslog-ng starting up; version='3.29.1'
Apr  2 11:47:01 freenas ---<<BOOT>>---


Saturday - April 9 (today):
Code:
Apr  9 07:00:05 freenas proftpd[1057]: 127.0.0.1 - ProFTPD killed (signal 15)
Apr  9 07:00:05 freenas proftpd[1057]: 127.0.0.1 - ProFTPD 1.3.6b standalone mode SHUTDOWN
Apr  9 07:00:06 freenas 1 2022-04-09T07:00:06.135602+02:00 freenas.local ntpd 1032 - - ntpd exiting on signal 15 (Terminated)
Apr  9 07:00:07 freenas syslog-ng[826]: syslog-ng shutting down; version='3.29.1'
Apr  9 10:29:12 freenas syslog-ng[826]: syslog-ng starting up; version='3.29.1'
Apr  9 10:29:12 freenas ---<<BOOT>>---


So since this is a recurring issue, my first thought was that it could be a scheduled task. But I'm not running any taks on a saturday morning. To be sure, here's a list:
Code:
S.M.A.R.T. LONG test: 9 1 * *
S.M.A.R.T. SHORT test: 23 * * mon

SCRUB task pool: 0 0 1 * *
SCRUB task USB-backup: 0 0 1 * *



It almost seems (as you can see in the logs) as if it has anything to do with FTP. But I can't seem to find what exactly. I run daily backups using FTP and the last backup (and thus FTP connection) is done at 04:44. So many many hours before 07:00 when the server hangs with the last message a message from FTP.


Some advise would be great!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Assuming you've followed the guidance posted in


for virtualization, the two obvious things are to ditch the hypervisor and see if the issue recurs on bare metal, in which case you will probably more easily spot the issue, or replace Proxmox with ESXi. Proxmox is an immature hypervisor for features like PCIe Passthru, and if your system isn't quite 100%, lockups are a common side effect.
 

Fidelity

Cadet
Joined
Apr 9, 2022
Messages
3
Assuming you've followed the guidance posted in


for virtualization, the two obvious things are to ditch the hypervisor and see if the issue recurs on bare metal, in which case you will probably more easily spot the issue, or replace Proxmox with ESXi. Proxmox is an immature hypervisor for features like PCIe Passthru, and if your system isn't quite 100%, lockups are a common side effect.

That's easier said then done :) It all worked perfectly for 2 years or so, and this issue started only since a couple of weeks ago. Same time when I migrated from Freenas to Truenas. I don't see how Proxmox is related to this issue (yet)?
 

Fidelity

Cadet
Joined
Apr 9, 2022
Messages
3
Damn... I can't believe you were right @jgreco. At first I didn't think it could have to do with Proxmox. But after checking if any tasks ran on Proxmox for this VM every saturday around 07:00. And yep... it stops the VM to make a backup exactly at that time. Totally forgot I set that up.

So thank you for that! Issue isn't fixed, but topic can be closed as this isn't Truenas related anymore. Thanks so much! Sorry for my previous comment :)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So thank you for that! Issue isn't fixed, but topic can be closed as this isn't Truenas related anymore. Thanks so much! Sorry for my previous comment :)

That's an awesome outcome. Good job.

The problem with virtualization is that you're stacking a house of cards on a house of cards. For simple VM's like a webserver, "no big deal", but for a thing like TrueNAS with all sorts of I/O demands, etc., it is an extremely difficult workload for a hypervisor.

The Proxmox fanbois are often unhappy with me because I "badmouth Proxmox", but fail to realize that I'm a reactive pragmatist. I know that theoretically other hypervisors other than ESXi could work, and there was a time only a decade ago where even ESXi was only ~50%(?) successful due to Nehalem/Westmere, immature PCIe passthru, etc., issues. I push ESXi heavily because it has the greatest likelihood of working, but that doesn't mean I'm happy perpetuating the VMware monopoly on difficult virtualization workloads. I have a love/hate relationship with lots of products, and anything that works, and works reliably, in a given case ... that's great. ;-)

One of the most important things is that we do need guinea pigs who are willing to test this. Proxmox is stable for some people. And the way to increase that sample set is for people who experience failures to drill down and determine what went wrong. I can't really do that from afar, so I just want you to know that you've done good work and that you've advanced the state of usability of Proxmox a bit.
 
Top