Can't view previous SMART test logs &is it in sequence?

Status
Not open for further replies.

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
I've got a disk I suspect with an issue, my server is making a ticking noise, it sounds like one of the actuators is rocking back and forth as if it's seeking sector 1000, 2000, 1000, 2000, 1000, 2000.
It's very reminiscent of the WDIDLE3 issue on some of the earlier WD Greens with a bad firmware. (I have a 500gb here which is unusable, any time there's NOT system requested disk activity, the disk will make ticking noises endlessly, despite working)

ANYHOW
I'd like to see the last SMART tests and what information came from them? I've found the area to configure the scheduled jobs but no way to view the old reports, do they not get archived or anything? Only if there's an error do you get an email? (I'm unsure)

Also if I do kick off a SMART long test, surely it runs per disk in sequence right, so the system is still usable? I believe a long test generally tests the entire surface of the disk?

Finally, how do I manually initiate a test? Do I need to manually set up a new scheduled job? I can't see a "run now" button on the configuration page for the schedules.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
OK,

I'll take this.

Your drives are almost certainly /dev/adaN where N is some number. You can get the full SMART readout with:
Code:
smartctl -x /dev/ada0
or whatever device you want.

Somewhere in there, you should see the results of the last several SMART tests. "Completed without error" is the key phrase you will see in the log. If you have the email set up correctly, my understand is that whenever your smart test does not complete without error, you'll be emailed.

Using smartctl, you can initiate your own tests anytime you want. I believe the syntax is
Code:
smartctl -t long /dev/ada0
for example to kick off a long test, but you'll need to Google that to make sure. And so on.

This should be enough information to get you started up.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
So just to clarify, there's no way to kick it off manually through the GUI?
I'm happy to shell in and do it - thought it might be available there though.

Do you know if the scripted SMART long check ( mine is bi-monthly) will sequentially do the disks (surely it would..?)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You don't really need to "view" the results of the tests. The SMART monitoring will check your logs and if you get anything except a "passed with no errors" then you get a nastygram email telling you that your server needs administration.

Long checks are totally dependent on the manufacturer and they are under no obligation to check the entire disk. However, I will tell you that every platter-based disk I have ever seen does a full platter check.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Do you know if the scripted SMART long check ( mine is bi-monthly) will sequentially do the disks (surely it would..?)
No, it does not, and there's no need to.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Ideally, a well-designed long test is not supposed to busy out the drive, so it shouldn't substantially affect the performance characteristics of the drive. What happens on a drive that is already 100% saturated and doesn't actually have free IOPS is one of the possible areas of concern; it's been quite some time since I played with failure modes here but I think the drive actually timed out the long test with some ambiguous-sounding error. The idea is that it should always be safe to do a long test without it slagging out your I/O.

So, no, the smartd test will just run them all at the same time and rely on the drives to behave properly.

Note also that this is similar to the ZFS scrub behaviour where it ensures that the drives are not really busy before slagging them out with scrub traffic (I think the current algorithm is actually something like wait-for-drive-having-been-in-idle-state-at-least-4-seconds).
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So just to clarify, there's no way to kick it off manually through the GUI?

Not that I'm aware of. Both this and the ability to review current SMART output would be nice additions.

I know the prevailing school of thought on the bugtracker (primarily by Princess Toothy Guard Dog) seems to be "it'll email you" but the reality is that when gear is on the bench, the machine isn't in its final production network location and so may not be ABLE to e-mail (may fail DNS fwd/rev validation tests, etc). The current setup is fine for hobbyists but I would like to be able to actually request the system launch a conveyance test, which is basically something that only happens maybe once or twice in a drive's lifetime, or review current stats without logging in to the CLI. Well actually *I* don't care since I'm a CLI guy but I find it difficult to advise people to be doing these things that aren't available except through the CLI.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Not that I'm aware of. Both this and the ability to review current SMART output would be nice additions.

I know the prevailing school of thought on the bugtracker (primarily by Princess Toothy Guard Dog) seems to be "it'll email you" but the reality is that when gear is on the bench, the machine isn't in its final production network location and so may not be ABLE to e-mail (may fail DNS fwd/rev validation tests, etc). The current setup is fine for hobbyists but I would like to be able to actually request the system launch a conveyance test, which is basically something that only happens maybe once or twice in a drive's lifetime, or review current stats without logging in to the CLI. Well actually *I* don't care since I'm a CLI guy but I find it difficult to advise people to be doing these things that aren't available except through the CLI.

While I am by no means anything more than an amateur programmer, I can't imagine something like pfSense's solution (GUI section to run SMART tests and view results, results displayed are simply the output of smartctl -a /dev/whatever piped into a fixed-spacing font HTML element) being particularly hard to implement.

It would certainly be one less reason to use the CLI, particularly during initial setup and validation - not a bad idea in an appliance.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
While I am by no means anything more than an amateur programmer, I can't imagine something like pfSense's solution (GUI section to run SMART tests and view results, results displayed are simply the output of smartctl -a /dev/whatever piped into a fixed-spacing font HTML element) being particularly hard to implement.

It would certainly be one less reason to use the CLI, particularly during initial setup and validation - not a bad idea in an appliance.

That's exactly my thinking. It'd be cool to go further with the whole SMART thing... since we seem to lack the ability to detect problems in other ways.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I will tell you that querying lots of disks can take minutes to do. My 24 disk server takes more than 2 minutes to query all of the disks. And for bigger servers I can only shiver at how long *that* would take.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
The problem with querying all the drives, or even a single drive is to know what will be returned from each drive, meaning knowing what each model drive type could possibly return and being able to handle those non-standard return values because if you provide a simple GUI way to return say "smartctl -x /dev/adaX", someone will want it filtered to only show test results or possible error issues. There is no way to meet everyone's request, well unless you make a user customizable filter. hum... maybe that would work. I'm sure someone could write a simple (maybe not that simple) script to do all that.

@OP
If your drive is still ticking, post the output of "smartctl -a /dev/adaX" but use www.pastebin.com or code brackets to retain the output format.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That is simply an argument against polling in realtime, Princess...
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Drives are polled every 30 minutes anyway, right? Storing the output shouldn't be too hard...
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
I will tell you that querying lots of disks can take minutes to do. My 24 disk server takes more than 2 minutes to query all of the disks. And for bigger servers I can only shiver at how long *that* would take.
No, it does not, and there's no need to.

I'm referring to a SMART long test here, which could take anywhere from 2 to 18 hours, depending on the disk size, since it checks every single sector of the disk (AFAIK) ....... hence making the disk EXTREMELY busy and difficult for it to be used for regular access.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'm referring to a SMART long test here, which could take anywhere from 2 to 18 hours, depending on the disk size, since it checks every single sector of the disk (AFAIK) ....... hence making the disk EXTREMELY busy and difficult for it to be used for regular access.

No, it doesn't make the disk extremely busy and difficult to use for regular access. In fact, most companies have put the performance penalty at <10%. Many will stop a SMART test temporarily when disk activity is requested.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
Yes but if it's in a 6 disk array, a single byte written or read from the array will delay the operation, right?
Am I right in thinking it does check the entire surface?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
On a potential disk write it's cached by ZFS unless it's a sync write. And if you have that many sync writes you probably have a ZIL. So no problem there.

On a potential disk read it's either cached by ZFS(so no problem) or it has to be obtained from the pool (which will take milliseconds to do). So are you going to argue that the additional time it takes to actually perform the read handicapped your pool?

If something as simple as a second read operation is so detrimental to a pool's performance you've got bigger problems. Because you probably have multiple read operations coming in all day long every day. ;)

This is such a small part of the bigger picture its almost laughable to be discussing it. It's like pissing in the ocean and arguing that the level went up. Sure, it went up. Is anyone going to actually give enough of a crap to even think about measuring it? Hell no.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
We're clearly not on the same page here.
If I drop 35gb of data to my system across the lan at 100MB/s - I'm going to assume that any single disk will be taking at least 5gb of those writes (and likely more for redundancy, probably more like 8.7gb)
How is a hard disk meant to sequentially read each sector of the entire disk, while also writing 8.7gb of data? It's going to _seriously_ impact performance and considering a full surface read is a multi-hour event (it was 12 hours last time I did one on my 3TB disks, let alone this batch of 5's I have) I figure it's going to be pretty nasty.

I'm under the impression that a long SMART check is fairly similar to a CHKDSK /R under Windows.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Right, and it's exactly like I said above. The SMART test is interrupted temporarily while the disk handles the writes. So aside from the fact that the disk will have to be interrupted a bunch of times, not much is lost. And since ZFS handles writes in large chunks every 4-6 seconds you won't actually have as many interruptions as you think.

Hence, the performance penalty is very small.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
Ok well I've kicked off smartctl -t long /dev/ada5 and I'll work backwards, it's claiming 9 hours per disk
Interestingly it says
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===

So does this mean the disk does infact go out of sync from the array for the test? before needing ... umm what's it called? The thing where the data is repaired?
 
Status
Not open for further replies.
Top