Cloud Sync post script / atrun bug ? - cannot open input file c0001701a9b55c: No such file or directory

Mauricio Silveira · Jan 17, 2023

Hi.
I've just spent some time debugging what I think is atrun misbehavior in TrueNAS core ( not sure if this applies to SCALE ).
EDIT: Testing on version TrueNAS-12.0-U8.1

After trying to make sense of how it is setup in TrueNAS ( maybe it's FreeBSD default, I really don't know FreeBSD ), I noticed that while linux default atrun is every minute, in TrueNAS it runs every 5 minutes.

Another point is that there's this cron entry at /etc/cron.d/at:

*/5 * * * * root /usr/libexec/atrun

And into /etc/crontab:

*/5 * * * * root /usr/libexec/atrun > /dev/null 2>&1

Running a test like:

echo "echo testing >> /tmp/testing.txt" | at now

OR

echo "echo testing >> /tmp/testing.txt" | at now +1 minute

doesn't work because it seems to make atrun run twice at the same time, leading to conflict accessing /var/at/jobs/cXXXXXXXXc, generating syslog messages like this:

Jan 17 08:00:00 storage01 1 2023-01-17T08:00:00.026230-03:00 storage01.local.domain atrun 52298 - - cannot open input file c0001401a9b553: No such file or directory
Jan 17 08:05:00 storage01 1 2023-01-17T08:05:00.053245-03:00 storage01.local.domain atrun 52397 - - cannot open input file c0001501a9b557: No such file or directory
Jan 17 08:10:00 storage01 1 2023-01-17T08:10:00.028049-03:00 storage01.local.domain atrun 52497 - - cannot open input file c0001601a9b55a: No such file or directory
Jan 17 08:10:00 storage01 1 2023-01-17T08:10:00.028049-03:00 storage01.local.domain atrun 52499 - - cannot open input file c0001801a9b55b: No such file or directory
Jan 17 08:10:00 storage01 1 2023-01-17T08:10:00.029169-03:00 storage01.local.domain atrun 52498 - - cannot open input file c0001701a9b55c: No such file or directory
Jan 17 08:15:00 storage01 1 2023-01-17T08:15:00.039606-03:00 storage01.local.domain atrun 52609 - - cannot open input file c0001901a9b561: No such file or directory
Jan 17 08:15:00 storage01 1 2023-01-17T08:15:00.039837-03:00 storage01.local.domain atrun 52608 - - cannot open input file c0001a01a9b560: No such file or directory
Jan 17 08:15:00 storage01 1 2023-01-17T08:15:00.040264-03:00 storage01.local.domain atrun 52607 - - cannot open input file c0001b01a9b561: No such file or directory

It will complain as many times as the number os jobs in queue.

So, for a test, I commented the cron.d/at entry, changed the /etc/crontab entry to */1 and now I have a "consistent" 1-minute prcision working atrun:

/etc/cron.d/at

#*/5 * * * * root /usr/libexec/atrun

/etc/crontab

*/1 * * * * root /usr/libexec/atrun > /dev/null 2>&1

Am I wrong about this or should I open a bug report for this to get fixed in future releases ?

sretalla · Jan 17, 2023

Mauricio Silveira said:
Am I wrong about this or should I open a bug report for this to get fixed in future releases ?

Well, you're at very least wrong to be modifying the file directly. That's going to be overwritten potentially at next reboot or best case next update.

If you feel you've identified something that's causing the system to behave badly, open a bug report (link at the top of the page).

Cron tasks need to be edited in the GUI in order to persist in the long term.

Mauricio Silveira · Jan 17, 2023

sretalla said:
Well, you're at very least wrong to be modifying the file directly. That's going to be overwritten potentially at next reboot or best case next update.

If you feel you've identified something that's causing the system to behave badly, open a bug report (link at the top of the page).

Cron tasks need to be edited in the GUI in order to persist in the long term.

I didn't really mean to make those changes permanent ( actually implied, I get it why you warned about it ).
I don't mean it to be used as a cronjob replacement.

My use case:
I have setup a CloundSync task, and I managed to get all the data I need for a cloudsync post script. BUT, since the post script is part of the job, I always got the RUNNING state as return for the job when running post-cloud.sh directly:

LASTJOBSTATE="$(midclt call cloudsync.query "[[\"description\", \"=\", \"${CLOUD_SYNC_DESCRIPTION}\"]]" | jq -r '.[0].job.state')"

So, I created a 2-script trigger, post-cloud-trigger.sh and post-cloud.sh:

#!/bin/bash
printenv > "/tmp/env-${CLOUD_SYNC_DESCRIPTION}"
echo "/root/post-cloud.sh \"${CLOUD_SYNC_DESCRIPTION}\"" | at now +2 minute

My goal was to use mostly TrueNAS facilities to complete the job -- use midclt to trigger cloud sync task+gather data, etc and send e-mail with attached log and job status SUCCESS/FAILED about specific cloud sync tasks triggered by a remote ssh command. 2-minute wait is to ensure I get a finished job data ( in case it gets to run 1 second after the job is completed and the data returned by midclt query is not updated yet )

But, I just found out that post-script is not run when job state is FAILED. (I'll have to dig around to find out a way around it )....
Another possible issue I found with CloudSync: I purposedly added a non-existent folder as source for a cloudsync task, it results in an error:

[EFAULT] Directory '/mnt/tank/STORAGE01/dfghj/pqp' does not exist

The above error does not send Alert mail, and, AFAIK, my alert settings is correct.

I tried running post script directly in bg (&), disown, etc, but It always become a child of the main process, so the only solution I found ATM is to schedule the real post script as an at job.

If anyone has any better options, please tell me :)

sretalla · Jan 18, 2023

Mauricio Silveira said:
2-minute wait is to ensure I get a finished job data

Couldn't you use some variant of sleep in a chained command rather than piping it to at?

Warning... sleep at the CLI is in seconds, not like programmatic sleep which is usually milliseconds in my experience.

so:

sleep 120 && whatever...

Mauricio Silveira · Jan 18, 2023

sretalla said:
Couldn't you use some variant of sleep in a chained command rather than piping it to at?

Warning... sleep at the CLI is in seconds, not like programmatic sleep which is usually milliseconds in my experience.

so:

sleep 120 && whatever...

No matter what I do: bg, sleep, etc It'll keep as child process from the main job.
Adding a sleep will just prolong the Task in the RUNNING state. Thus, midctl still return the task in RUNNING state.

using printenv to send the ENV of the post script to another file, then reading it back using the "real" script triggered by at was the only way I found around this.

I've opened an ISSUE about atrun: Jira ISSUE NAS-119905

Mauricio Silveira · Jan 18, 2023

And here's another issue related to CloudSync issues/improvements: Jira ISSUE 119914

Mauricio Silveira · Jan 18, 2023

After digging a little deeper, now I understand why atrun doesn't get any attention: ix "centralized" all cron related job into its db using middleware ( midclt call cronjob.query ), leaving no room for manual cronjob editing.
Linux uses atd, which understant "at now" like run immediately, but in BSD using "at now" means the next time atrun is acheduled to run in cron.

Just too many runs....

Mauricio Silveira · Jan 19, 2023

Gathering interest for these issues, please vote: https://ixsystems.atlassian.net/browse/NAS-119914

Important Announcement for the TrueNAS Community.

Cloud Sync post script / atrun bug ? - cannot open input file c0001701a9b55c: No such file or directory

Mauricio Silveira

Dabbler

sretalla

Powered by Neutrality

Mauricio Silveira

Dabbler

sretalla

Powered by Neutrality

Mauricio Silveira

Dabbler

Mauricio Silveira

Dabbler

Mauricio Silveira

Dabbler

Mauricio Silveira

Dabbler

Similar threads