SOLVED How to fix scrub schedule?

Status
Not open for further replies.

HolyK

Ninja Turtle
Moderator
Joined
May 26, 2011
Messages
654
RESOLVED - See posts #10 and #14

Hi,

Either i am stupid or there is something wrong with the scrub schedule mechanism because i am not able to schedule it at the day/time i want. I have two pools and originally i had scrubs scheduled bi-weekly on Monday and Thursday at 3AM. It was working flawlessly until i had some maintenance to do and i had the box shut down during few days. Sadly these days were the planned days for scrubs so neither one ran as per schedule. Since then i have scrubs running automatically at the same time (not cool!) every second week. Whatever schedule i use it is simply ignoring the day/time.

The main "issue" is with the "threshold" value. Docs says:
"Threshold days - number of days since the last scrub completed before the next scrub can occur, regardless of the calendar schedule; the default is a multiple of 7 which should ensure that the scrub always occurs on the same day of the week."

Sadly the description is not very detailed about how exactly it is connected to the calendar schedule ...

Imagine it is Monday and first scrub was finished today:
- If i set the scrub to "21" and the calendar schedule is set to run in 7 days will it run next Monday or the next-next Monday (after three weeks, basically ignoring the calendar)?
- If i set the scrub to "3" and the calendar schedule is set to run in 7 days will it run in next week or in three days (again ignoring the schedule)?

It seems like the calendar schedule is ignored in both cases and the threshold is the main value which is used. If so then why do we have the calendar at the first place? It looks like only the hour/minute but not days/months are used. And second and more important question ... how (the hell) i fix/move the scheduled time to the one i want?

This is my current setup:
upload_2017-5-20_21-39-10.png


Threshold was originally set to 14, current 12 is just to test and understand how it actually works. Last scrub was done today. Previous one was done 7.5. for whatever reason.

Code:
 ~# zpool status st0rage
  pool: st0rage
state: ONLINE
  scan: scrub repaired 0 in 3h29m with 0 errors on Sat May 20 05:29:47 2017

~# zpool status redmirror
  pool: redmirror
state: ONLINE
  scan: scrub repaired 0 in 3h0m with 0 errors on Sat May 20 05:00:50 2017


I am basically out of ideas how to make it run at the day i want. Last thing i could try is to set one threshold to "16" (20.5. + 16 days => Monday 5.6.) and second one ti 17 (again 20.5. + 17 => Tuesday 6.6.). Problem is that i don't think that it will work. I *guess* that the threshold is passed as a parameter at the time when the cron job is executed so next-next scrub will actually work with that number (again messing with the schedule).

Code:
#minute hour  mday  month  wday  who  command
00  02  01,02,03,04,05,06,07,15,16,17,18,19,20,21  *  2  root  PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 12 redmirror
00  02  01,02,03,04,05,06,07,15,16,17,18,19,20,21  *  1  root  PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 12 st0rage


So if someone could put some light in it and explain how it actually works and how to force it to run at the specified time.

I am running FreeNAS-9.10.2-U1 (86c7ef5)

Thank you in advance
 
Last edited by a moderator:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Looking at your setup it appears correct for what you desire to do.

Realize that you will skip one week for a scrub on a long month such as May, you would hit 29 May (Monday) and it would not run but rather skip this week and run again 5 June.

I could swear that this has been a complaint many times before. My advice is to do a little troubleshooting...

1) Delete your Scrub jobs.
2) Create your own new cron jobs (not under Scrubs) and manually setup the scrub schedule. See how it works.
3) Look for previous complaints about the scrub not working properly.

The thing here is your CRON schedule looks proper and you should not be getting a scrub start at all on Saturday. I suspect that there is something wrong with the programming and something is firing off a scrub but it's not the cron job listed under the scrub section.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I'm running a little test but it will take a few days to come up with results. I'm just curious if there is something else which may fire off a scrub besides the specific entry in the crontab for the scrub.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
So if someone could put some light in it and explain how it actually works and how to force it to run at the specified time.
It works as follows:

A task is run if:
  • The current day is selected on the calendar
  • AND the current time is selected on the calendar
  • AND the current weekday is selected on the calendar
  • AND it's been at least (more than? I'm not sure) threshold days.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
But it sounds like the scrub command is being called too frequently. The cron listed above is for Monday and Tuesday, why would a scrub be called on Saturday? The 2AM time was correct. The threshold is only a parameter passed to the scrub application "-t 12" but the entire command line should not be called unless the cron settings are a match. I think we all understand how crontab works. The only thing I can think of is either @HolyK has done something and has two cron jobs setup or something is calling the scrub but it's not crontab, well not directly. And we have seen other users try to get the scheduler to work properly and I believe they just ended up changing the threshold to get somewhat the desired result. I think this warrants further investigation.

@HolyK , do you have any other scrubs listed besides the freenas-boot in the /etc/crontab ?
 

HolyK

Ninja Turtle
Moderator
Joined
May 26, 2011
Messages
654
Realize that you will skip one week for a scrub on a long month such as May, you would hit 29 May (Monday) and it would not run but rather skip this week and run again 5 June.
Yop i am aware of that. Time to time it will skip two weeks due to different month length. For 2017 it is like two or three times. I am fine with that.
1) Delete your Scrub jobs.
2) Create your own new cron jobs (not under Scrubs) and manually setup the scrub schedule. See how it works.
3) Look for previous complaints about the scrub not working properly.
1) Already tried to delete and create again under Scrubs
2) I'll test that. I suppose the command will be the full one i see in cron now so like: PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 12 redmirror
3) I was looking for similar topics but unfortunately i haven't found any usable leads to fix the problem
It works as follows:
A task is run if:
  • The current day is selected on the calendar
  • AND the current time is selected on the calendar
  • AND the current weekday is selected on the calendar
  • AND it's been at least (more than? I'm not sure) threshold days.
As you can see there is something more behind as scrubs for both of my pools were executed today (Saturday) even the schedule is set to Monday/Tuesday

Regarding the "Threshold" value the documentation says: number of days since the last scrub completed before the next scrub can occur, regardless of the calendar schedule;

I am checking content of the /usr/local/libexec/nas/scrub which is basically handling all of the exceptions (like no scrub is started if another is running or if resilver is running, etc) before the real "zfs scrub <POOL>" is called. A part of it is handling the threshold value:

Code:
  # Now minus last scrub (both in seconds) converted to days.
   _scrub_diff=$(expr -e \( $(date +%s) - \
	 $(date -j -f %F.%T ${_last_scrub} +%s) \) / 60 / 60 / 24)
   if [ ${_scrub_diff} -lt ${threshold} ]; then
	 # echo "  skipping scrubbing of pool '${pool}':"
	 # echo "	last scrubbing is ${_scrub_diff} days ago, threshold is set to ${threshold} days"
	 return 3
   fi


So based on this we can say that the scrub is skipped (will not run) if the numbers of days passed from last scrub is LESS than the threshold value. Meaning scrub IS executed if the time passed from last scrub is MORE than the threshold value. So the "regardless of the calendar schedule" statement translates to "If you set new threshold via GUI you need to wait at least *that* number of days since last scrub before the next scrub will actually executes no matter what is the new scheduled day/time. Basically cron entry will just run the scrub script which will then fail on the threshold check.

BTW: it is sad that all of the "echoes" describing the non-zero returns are commented out rather that redirected to /var/messages . Basically we're blind if something will not execute as expected.


Anyway in my case it looks like something else is calling these wrong scrubs OR the cron is doing very bad job executing the jobs... As i mentioned before i have two scrub jobs scheduled for Monday (1) and Tuesday (2).
00 02 01,02,03,04,05,06,07,15,16,17,18,19,20,21 * 2 root PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 12 redmirror
00 02 01,02,03,04,05,06,07,15,16,17,18,19,20,21 * 1 root PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 12 st0rage

But something else is executing the scrubs. Hell i am even receiving emails about that at a wrong days!
upload_2017-5-21_0-38-6.png

starting scrub of pool 'st0rage'
starting scrub of pool 'redmirror'

Am i drunk or my NAS is ?

HolyK
 
Last edited:

HolyK

Ninja Turtle
Moderator
Joined
May 26, 2011
Messages
654
@HolyK , do you have any other scrubs listed besides the freenas-boot in the /etc/crontab ?

Here is my full crontab. Everything was done via GUI, no manual messing with crontab vi CLI.
Code:
cat /etc/crontab
# /etc/crontab - root's crontab for FreeBSD
#
# $FreeBSD: src/etc/crontab,v 1.33.2.1 2009/08/03 08:13:06 kensmith Exp $
#
SHELL=/bin/sh
PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin
#
#minute hour  mday  month  wday  who  command
#
*/5  *  *  *  *  root  /usr/libexec/atrun > /dev/null 2>&1
#
# Save some entropy so that /dev/random can re-seed on boot.
*/11  *  *  *  *  operator /usr/libexec/save-entropy > /dev/null 2>&1
#
# Rotate log files only at midnight.
0  0  *  *  *  root  newsyslog > /dev/null 2>&1
#
# Perform daily/weekly/monthly maintenance.
1  3  *  *  *  root  periodic daily
15  4  *  *  6  root  periodic weekly
30  5  1  *  *  root  periodic monthly
#
# Adjust the time zone if the CMOS clock keeps local time, as opposed to
# UTC time.  See adjkerntz(8) for details.
1,31  0-5  *  *  *  root  adjkerntz -a > /dev/null 2>&1

0  *  *  *  *  root  /bin/sh /usr/local/sbin/save_rrds.sh > /dev/null 2>&1

5  3  *  *  *  root  /bin/sh /usr/local/bin/telemetry-cron.sh > /dev/null 2>&1

0  *  *  *  *  root  /usr/local/bin/python /usr/local/bin/mfistatus.py > /dev/null 2>&1
1,31  *  *  *  *  root  /usr/local/bin/python /usr/local/www/freenasUI/tools/alert.py > /dev/null 2>&1

15  3  *  *  *  root  /usr/local/bin/python /usr/local/www/freenasUI/tools/cachetool.py expire >/dev/null 2>&1
30  3  *  *  *  root  /usr/local/bin/python /usr/local/www/freenasUI/tools/cachetool.py fill >/dev/null 2>&1
45  3  *  *  *  root  /usr/local/bin/python /usr/local/www/freenasUI/middleware/notifier.py backup_db >/dev/null 2>&1
0  3  *  *  *  root  find /tmp/ -iname "sessionid*" -ctime +1d -delete > /dev/null 2>&1
30  */5  *  *  *  root  /etc/ix.rc.d/ix-kinit renew > /dev/null 2>&1
*/10  *  *  *  *  root  [ -s /tmp/mail.queue ] && /usr/local/bin/python /usr/local/www/freenasUI/tools/mailqueue.py
45  3  *  *  *  root  /usr/local/libexec/nas/scrub -t 14 freenas-boot
50  01  *  *  *  root  PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /mnt/misc/scripts/backup_config.py > /dev/null 2> /dev/null
00  02  01,02,03,04,05,06,07,15,16,17,18,19,20,21  *  2  root  PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 12 redmirror
00  02  01,02,03,04,05,06,07,15,16,17,18,19,20,21  *  1  root  PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 12 st0rage
00  01  01  *  *  root  PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 29 misc
15  4  *  *  6  root  /usr/local/bin/python /usr/local/www/freenasUI/tools/autosnap.py > /dev/null 2>&1
6  1  *  *  *  root  /usr/local/bin/python /usr/local/www/freenasUI/tools/update_check.py > /dev/null 2>&1


EDIT: What about the
1 3 * * * root periodic daily
This basically calls "/etc/periodic/daily/800.scrub-zfs" ... it has local threshold set to 35 but who knows ...
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
As you can see there is something more behind as scrubs for both of my pools were executed today (Saturday) even the schedule is set to Monday/Tuesday
Now that I think about it, I vaguely recall this behavior showing up some time ago. It was supposed to have been filed as a bug report, but I'm not sure it was.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
The main "issue" is with the "threshold" value. Docs says:
"Threshold days - number of days since the last scrub completed before the next scrub can occur, regardless of the calendar schedule; the default is a multiple of 7 which should ensure that the scrub always occurs on the same day of the week."

IIRC You may without issue, set the threshold setting to a zero(0) value.
This is what I would do if you desire each pool to be scrubbed every week controlled only by the calendar.
 

HolyK

Ninja Turtle
Moderator
Joined
May 26, 2011
Messages
654
Okay i guess i found it (please correct me if i am wrong)... the whole problem is proper understanding how the cron ACTUALLY works ...

When using "day-of-month" and "day-of-week" the evaluated condition is NOT like .... "day-of-month AND day-of-week" but it IS "day-of-month OR "day-of-week"

Meaning ... if i will use simple cron entry like:
10 11 15,16,17 * 6 root /tmp/blah.sh

It DOES NOT mean that this script will run at 11:10 at 15th, 16th, 17th and only if it's Saturday

What it actually means is that the script will run at 11:10 three days in a row (15th, 16th, 17th) AND any kind of saturday no matter what day of the month it is !!

So in my case the entry is wrong ... because this...
Code:
00 02 01,02,03,04,05,06,07,15,16,17,18,19,20,21 * 2 root PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /usr/local/libexec/nas/scrub -t 12 redmirror

... actually means ... run that thing 14x in a month plus every Tuesday whatever day it is ...

So what was keeping my system out of going scrub-madness every day was the threshold value ... which was saying like "no-no-no, not yet"

Source: https://www.freebsd.org/cgi/man.cgi?query=crontab&sektion=5&manpath=freebsd-release-ports
Note: The day of a command's execution can be specified by two fields --
day of month, and day of week. If both fields are restricted (ie, are
not *), the command will be run when either field matches the current
time. For example, ``30 4 1,15 * 5'' would cause a command to be run at
4:30 am on the 1st and 15th of each month, plus every Friday.

Lesson learned ... Do not believe what is on internet even if its on multiple places. When i was looking for a way how to schedule cron job bi-weekly this supposed to be the way. Specify days of the every second week in month + restrict the execution per day of week. Obviously this is a bulls*it ...

So i guess the simple way of doing it is to let the "day-of-month" empty (*) and specify only the day-of-week + threshold set to 14. So the cron job is called every week but the scrub script will suppress every second run.

Other way is to wrap another script specifying the days around the scrub one and then schedule it in cron at specified day-of-week while the internal check will either exit or call the scrub script ... depends on what day-of-month it is.
 
Last edited by a moderator:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
This definitely needs a better interface in the new GUI. That said...

It DOES NOT mean that this script will run at 11:10 at 15th, 16th, 17th and only if it's Saturday

What it actually means is that the script will run at 11:10 three days in a row (15th, 16th, 17th) AND any kind of saturday no matter what day of the month it is !!
My experience is the opposite. I have all weekdays checked and yet the scrub always runs on the 8th and on the 24th of each month.
 

HolyK

Ninja Turtle
Moderator
Joined
May 26, 2011
Messages
654
This definitely needs a better interface in the new GUI. That said...
My experience is the opposite. I have all weekdays checked and yet the scrub always runs on the 8th and on the 24th of each month.

I suppose that "all weekdays checked" = "*" in the crontab ... therefore it works as it should ... If you read the crontab manual carefully you will .... well ... "sh!t bricks"
Note: The day of a command's execution can be specified by two fields -- day of month, and day of week. If both fields are restricted (ie, are not *), the command will be run when either field matches the current time. For example, ``30 4 1,15 * 5'' would cause a command to be run at 4:30 am on the 1st and 15th of each month, plus every Friday.
So if one of the filed IS "*" then it runs only as specified in the other one (8th and 24th in your case).

Whoever invented this was on meth or something...

And yes, we definitely need better UI for this :]
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I feel like killing the author of cron (please don't be ken, dmr or bwk).

Ah, crap, it's ken. Off I go on a pilgrimage to atone for my sinful thoughts about Unix.
 

HolyK

Ninja Turtle
Moderator
Joined
May 26, 2011
Messages
654
Ok so here it comes ... simple workaround for the bi-weekly schedule ...

0 2 * * 1 root test `echo "$(date +%U) % 2" | bc` -eq 1 && /usr/local/libexec/nas/scrub -t 14 st0rage
0 2 * * 2 root test `echo "$(date +%U) % 2" | bc` -eq 1 && /usr/local/libexec/nas/scrub -t 14 redmirror


It basically checks if current week is odd or even. If it is odd then run scrub. This will ensure bi-weekly scrub execution no matter how much days each month has. Only issue is if the year has odd numbers of weeks so it will run scrub last week in the year and again at the first week in next year. But i guess it is OK.

BTW: I was about to use the same logic for smartctl long tests but for even weeks ... but they're not in the crontab ?! Where the schedule of smartct is actually stored?

Currently i have:
upload_2017-5-21_13-20-32.png


Which is again wrong due to the day-of-week vs day-of-month thing. But where i can find this?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Ok so here it comes ... simple workaround for the bi-weekly schedule ...

0 2 * * 1 root test `echo "$(date +%U) % 2" | bc` -eq 1 && /usr/local/libexec/nas/scrub -t 14 st0rage
0 2 * * 2 root test `echo "$(date +%U) % 2" | bc` -eq 1 && /usr/local/libexec/nas/scrub -t 14 redmirror


It basically checks if current week is odd or even. If it is odd then run scrub. This will ensure bi-weekly scrub execution no matter how much days each month has. Only issue is if the year has odd numbers of weeks so it will run scrub last week in the year and again at the first week in next year. But i guess it is OK.

BTW: I was about to use the same logic for smartctl long tests but for even weeks ... but they're not in the crontab ?! Where the schedule of smartct is actually stored?

Currently i have:
View attachment 18447

Which is again wrong due to the day-of-week vs day-of-month thing. But where i can find this?
I must say "Wow" that it is CRON causing the issues. I had no idea it was an "or" statement until this thread and then I read the cron description closer and sure enough, the weekday is the only "OR" condition.

As for the new cron job, did you add that via the GUI as just a new cron job or edit crontab? I ask so that it's retained during an upgrade/reboot condition. You can do the same with the SMART tests, just create a new pair of cron jobs, but I'm not sure if you would get an email if it failed. You may want to run a script to do this work. Therwe are several scripts on the forums to accomplish this.

And nice work around.

I updated bug report 20230 and hopefully this will be addressed.
 

HolyK

Ninja Turtle
Moderator
Joined
May 26, 2011
Messages
654
As for the new cron job, did you add that via the GUI as just a new cron job or edit crontab? I ask so that it's retained during an upgrade/reboot condition.
I added it via GUI but i just found that it will not work because GUI is messing with the command string

If i use the "workaround" directly in console it is working:
[root@HolyNAS] ~# sh
# test `echo "$(date +%U) % 2" | bc` -eq 1 && echo "OK"
OK


But if i add the same via CRON it escapes the " % " automatically
[root@HolyNAS] ~# cat /etc/crontab | grep OK
00 02 * * 3 root PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" test `echo "$(date +\%U) \% 2" | bc` -eq 1 && echo "OK"

See the extra " \ " .... "$(date +\%U) \% 2"

Sadly this one is not working anymore
[root@HolyNAS] ~# sh
# test `echo "$(date +\%U) \% 2" | bc` -eq 1 && echo "OK"
test: bc:: unexpected operator


It seems like the GUI related to CRON jobs requires some polishing.

I updated bug report 20230 and hopefully this will be addressed.
Thanks!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
But you could simply create a script and then call that script from a cron job. Just put the scripts on your pool.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Code:
#!/bin/sh
test `echo "$(date +%U) % 2" | bc` -eq 1 && /usr/local/libexec/nas/scrub -t 14 st0rage


Make sure to chmod +x to the script and then call it.
 

HolyK

Ninja Turtle
Moderator
Joined
May 26, 2011
Messages
654
Yes that's what i will need to do. I really thought that i could manage it easily via one entry in Cron per one scrub but apparently it is throwing sticks under my feet.

BTW i've found 2 years old bug regarding the escapes of % ... https://bugs.freenas.org/issues/9091 ... I'll raise a new one since the issue is still there...

EDIT: New bug report for the wrong escape in cron entry - #24084
 
Last edited:

HolyK

Ninja Turtle
Moderator
Joined
May 26, 2011
Messages
654
One more thing regarding the SMART tests ... anyone knows where the schedule is actually stored? I have four SMART schedules but they're not in crontab
upload_2017-5-21_20-2-1.png


I guess i will get rid of these as well and make my own schedule as you suggested - at least for long tests to ensure that they will not hit the same window as the scrub ...
 
Status
Not open for further replies.
Top