Scrub and SMART testing schedules

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So you're asking what schedule to set for scrubs and SMART testing. Well, here's a few tidbits of info:

1. Scrubs are your "regular" maintenance for zpools. They can take a few minutes to a few days depending on the size of your pool, the performance of your pool, your pool's data storage history, the performance of your system as a whole, and the workload place on your pool during the scrub.
2. SMART tests are internal drive tests. There is no 'criteria' for what is or isn't done on a particular test. No doubt each manufacturer has their own specifications for what a "short test" and a "long test" entails. Generally, short tests take less than 5 minutes and long tests take hours. Long test usually read an entire platter to check for errors and short tests do a very simple and quick test.
3. Don't try to schedule SMART tests at the same time as scrubs. It doesn't end well. Your disk can't do a scrub, a SMART test, and handle regular tasks at the same time very well.
4. SMART tests are non-destructive. So you can run them as often as you want. But, you can only run one test at a time per disk(duh?).
5. SMART test results do not return a final result. If you've setup your FreeNAS box properly it will email you if a SMART test fails. So no email means everything is good.
6. Your average disk will store the last 20 or so test results. So if you do tests at a very high frequency and one test fails it may be removed from the log before you can even examine it closely.
7. You can blindly steal my schedule or come up with your own. Since this is for a home server and I'm the only user I don't worry about any performance penalty since my pool performs more than adequately even during a scrub.
8. Scrubs are pretty hard on disks. So scheduling them at a frequency that makes you comfortable with your pool is important.
9. Do not confused SMART monitoring with SMART testing. One does tests, the other only monitors for the drive to find errors through regular use.
10. If you are running SSDs, these tests are almost pointless. Do them if you want, but they're not really functioning in the capacity that you'd expect. Both short and long tests typically take seconds to complete for most brands of SSD. So it's obvious that a long test doesn't actually read every memory cell looking for errors.

So here's my schedule:

SCRUBS: 1st and 15th of the month at 4am. Theshold is set to 10 days.

SHORT SMART TEST: Every 5th, 12th, 19, and 26th of the month at 3am.

LONG SMART TEST: Every 8th and 22nd at 4am.

SHORT SMART TEST alternate: Some people do every odd or even day and choose to do it at a time where it would never interfere with a scrub or long test. For example, my scrubs and long test are schedule for 4am. So if I do 3am for short test I could theoretically do one every single day and not have any conflict in the schedule since the test takes 2 minutes.

If you look at my schedule, I never schedule anything on or after the 28th. This is because every month has a different number of days. If you try to schedule things on those days they will be skipped some months. So instead of trying to deal with it I simply don't schedule anything then. Yes, this means that between the 26th of one month and the first of the next month I don't really do any tests. But to be frank, if you are expecting things to go horribly wrong because you didn't do a test for 5 days, you've got bigger problems and should reconsider your design.

There is no right or wrong schedule. If you want to do scrubs every single day you can. It's a bit excessive in my opinion. It may also cause premature failure of your disks because of the extra wear and tear.

If you want to see how a test is doing, the appropriate command is something like:

# smartctl -a /dev/da1

The output is quite long, but a few sections are useful:

Code:
Short self-test routine
recommended polling time:        (  2) minutes.
Extended self-test routine
recommended polling time:        ( 482) minutes.
Conveyance self-test routine
recommended polling time:        (  5) minutes.


That is how long a Short, Long, and Conveyance test is estimated to take if the disk is completely idle.

Code:
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA                                                                                                                                                            _of_first_error
# 1  Short offline      Completed without error      00%    32310        -
# 2  Extended offline  Completed without error      00%    32270        -
# 3  Short offline      Completed without error      00%    32262        -
# 4  Short offline      Completed without error      00%    32214        -
# 5  Short offline      Completed without error      00%    32166        -
...


This tells you what tests have completed and when. Lifetime hours can be checked in the previous category. If a test is in progress it may be listed above. Some brands will not actually provide an entry until the test completes or fails.

If your drive fails a test it usually qualifies for an RMA.

No doubt others will provide their configurations.

Good luck and happy storing!
 

alexg

Contributor
Joined
Nov 29, 2013
Messages
197
Do you recommend staggering smart tests so each disk doesn't overlap?

How about multiple pool scrubs?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I do all scrubs on all of my pools at the same time.

I also see no reason not to do all of the SMART tests at the same time. If you want to stagger them, you can. It means your server will need more "quiet time" to do it's smart tests and scrubs though. I try to either have it do all it can do for tests, or be online and 100%.
 

budmannxx

Contributor
Joined
Sep 7, 2011
Messages
120
Do you recommend staggering smart tests so each disk doesn't overlap?
How many disks do you have? I've run simultaneous long tests on my 6 disks with no noticeable issues. Maybe a temperature increase of a couple of degrees, but not enough for me to worry.

How about multiple pool scrubs?
I only have one pool, so I can't speak to this.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526

alexg

Contributor
Joined
Nov 29, 2013
Messages
197
Cyberjock,

How about PULL or PUSH side of replication during scrubs and long smart tests? Any advise on which ones to avoid to be run at the same time?

Thanks
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You want to keep server load to a minimum while doing scrubs or long tests. That's about the only advice I can give.
 

Robert Smith

Patron
Joined
May 4, 2014
Messages
270
SCRUBS: 1st and 15th of the month at 4am. Theshold is set to 10 days.

Since there are over 10 days between the 1st and 15th of the month, would not threshold set to 10 days trigger another scrub in between the two?

Or am I misunderstanding how thresholds work? Please enlighten me.


Thank you.

EDIT: Ok, the documentation is a little confusing, but I think I figured it out. Threshold actually means the opposite of what I originally thought it did.

Threshold is the minimum amount of days that must pass after the previous scrub completion before a new scrub is allowed to run.
 
Last edited:

camilo suarez

Explorer
Joined
Feb 28, 2014
Messages
86
noob question, how do look all the info that "smartctl -a /dev/da1" gives in shell console? i can only check the last lines.
 

camilo suarez

Explorer
Joined
Feb 28, 2014
Messages
86
another question, what does this lifetime mean? ive searched google and i cant find a answer i can trust.
 

Attachments

  • lifetime.JPG
    lifetime.JPG
    67.5 KB · Views: 1,726

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You really should be googling all these SMART questions. There's a couple of very detailed pages that make a good primer for stuff like this. ;) Asking in a how-to guide thread is not appropriate.
 

Robert Smith

Patron
Joined
May 4, 2014
Messages
270
doesnt work for me i only get "~" on console :(

You can also pipe the output to less, which will stop after screen fills, and let you scroll with “d” and “b” keys.

Piping works by adding the pipe character "|" and the next command at the end of the previous command:

smartctl -a /dev/da1 | less


another question, what does this lifetime mean? ive searched google and i cant find a answer i can trust.

That is how old the drive was in hours at the time each test run. How that number is calculated is up to the manufacturer.
 

camilo suarez

Explorer
Joined
Feb 28, 2014
Messages
86
awesome with |less works perfect, so the table that shows the lifetime show first the newer test.

thanks for your help.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,374
Hi Cyberjock,

I really find the scheduling UI to be quite confusing

I can choose
"Every N hour" or "Each Selected Hour"
"Every N day of month" or "Each Selected day of month"
I can then tick the months it will occur and finally and most confusingly the days (at the very bottom)

So with a scrub, I really want these to occur, right? They are important, yes? they also take a long time and the machine wants to be 'left alone' when I do them.

So if I select every 15'th of the month at 2am, I get confused by the day selection at the bottom. If I untick all but Sunday would I be right in thinking that it will only occur if the 15'th falls on a Sunday??
 

FlexibleToast

Dabbler
Joined
Aug 10, 2014
Messages
32
So if I select every 15'th of the month at 2am, I get confused by the day selection at the bottom. If I untick all but Sunday would I be right in thinking that it will only occur if the 15'th falls on a Sunday??

I'm not 100% certain, but my guess is the GUI is a front end for the crontab. All the options are cron options after all. Based off of that, yes your understanding would be correct.

https://en.wikipedia.org/wiki/Cron
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,374
Yeah thanks for the reply, it's an odd UI but I (think) I've got it now, looks powerful but confusing.
I think it simply needs more AND OR IF written in spots, or a clear summary of what you've setup in text form.
 
Top