Can't view previous SMART test logs &is it in sequence?

jgreco · Sep 5, 2014

Pay attention. I already explained above that the SMART test is not slagging out the drive. Then Cyberjock also explained the same thing.

IT IS NOT A PROBLEM BECAUSE IT DOESN'T WORK THE WAY YOU SEEM TO THINK.

joeschmuck · Sep 5, 2014

diskdiddler said:
Ok well I've kicked off smartctl -t long /dev/ada5 and I'll work backwards, it's claiming 9 hours per disk
Interestingly it says

So does this mean the disk does infact go out of sync from the array for the test? before needing ... umm what's it called? The thing where the data is repaired?

A long test does not take the drive offline so it will not be out of sync. To run Offline tests you need to use the -offline parameter, assuming your drive even supports offline testing. The term you were looking for was SCRUB.

joeschmuck · Sep 5, 2014

Ericloewe said:
Drives are polled every 30 minutes anyway, right? Storing the output shouldn't be too hard...

This is news to me, what are they polled for? Presence, Health Status? It would have to be something that doesn't spin them up because you can spin them down and leave them down if you configure your NAS properly for that type of action.

cyberjock · Sep 5, 2014

The default is 30 minutes per the WebGUI. It's under the SMART service settings called "check interval".

You are correct that polling them doesn't wake them up, most of the time. Some brands and models do but most don't. I'm pretty sure WD don't spinup and I'm pretty sure that Toshiba drives do.

joeschmuck · Sep 5, 2014

cyberjock said:
The default is 30 minutes per the WebGUI. It's under the SMART service settings called "check interval".

You are correct that polling them doesn't wake them up, most of the time. Some brands and models do but most don't. I'm pretty sure WD don't spinup and I'm pretty sure that Toshiba drives do.

Arg! It's been so long since I configured FreeNAS in that section that I forgot about it. It must be a simple health check (-H) as I know it doesn't spin up the drives, or at least not my WD Reds.

cyberjock · Sep 5, 2014

Yeah, it does something equivalent to "-a" and it looks for anything "bad" and reports only if something is actually wrong. The downside is that if you misconfigure this and SMART is turned off you don't really have any indicator of a problem, and you might not until it is too late. That's why I do my nightly email that gives a quick printout of temps and current pending sector count as those 2 are pretty solid "sh*t is going bad" indicators.

diskdiddler · Sep 6, 2014

Well there's bad sectors, it's nearing the 24 hour mark for performing the test, I predicted it would take significantly longer due to disk activity, it's really going to cause head thrash.
None the less there's bad sectors - what's interesting is it was estimated to complete midday today, it's now 10:30pm and I can see in the logs when I log in (the notify log at the bottom) that it's found bad sectors at 9:30pm
Interestingly, I've already been emailed about these sectors, as if it's finished testing but I imagine that is infact not the case.

So the point of my post, what's the command to view the current status of the existing smart long check? I'm hesitant to reboot or really do anything until the smart check is finished.

joeschmuck · Sep 6, 2014

diskdiddler said:
So the point of my post, what's the command to view the current status of the existing smart long check? I'm hesitant to reboot or really do anything until the smart check is finished.

"smartctl -t long /dev/adaX" and it will tell you if the test is in progress/status.

Edit: And you can use the -X parameter to cancel the test.

diskdiddler · Sep 6, 2014

Ok thanks man.
So when the monthly scrub would have run, it would probably have also picked these bad sectors up right? then informed me about them? (I'm going to make the assumption the bad sectors in this instance are within the earlier part of the disk where data is contained, for the sake of ease) ?

cyberjock · Sep 6, 2014

And a SMART test stops on error, so as soon as the test had run into its first bad sector the test should have failed with an entry in the log....

So your "concern" about head thrash is nothing more than talking up a problem that doesn't exist. Not to mention the fact that your disk does seeks ALL the time.

You know, you keep talking about SMART as if it's "a matter of fact" and yet you are continually having to be corrected. Have you considered that your actual knowledge of SMART is incorrect and/or outdated?

jgreco · Sep 6, 2014

Scrubs only examine portions of the disk that are in use. So, no, not necessarily.

diskdiddler · Sep 6, 2014

cyberjock said:
And a SMART test stops on error, so as soon as the test had run into its first bad sector the test should have failed with an entry in the log....

So your "concern" about head thrash is nothing more than talking up a problem that doesn't exist. Not to mention the fact that your disk does seeks ALL the time.

You know, you keep talking about SMART as if it's "a matter of fact" and yet you are continually having to be corrected. Have you considered that your actual knowledge of SMART is incorrect and/or outdated?

Listen you're going to have to calm down or stop making silly posts.
A 500 minute test initiated last night took over twice as long, my disk has been ticking endlessly all day.
Maybe you think I've got some kind of "Staging" drive? Or I don't use the server regularly? It's reading and writing regularly, I have a SINGLE array. ALL writes are going to hit ALL disks, I've been writing to that disk for 4 days straight at 4kb a second.
The test took over twice as long as estimated.

I'd prefer if the others help, too much misinformation from you the past 8 weeks.

EDIT: furthermore my SMART bad sectors began at 12:53pm and continued on until 9pm - so your "concern" that a smart test will stop as soon as it find a single entry is wrong.
Just because I'm not a BSD guru doesn't mean I don't know hardware, I've been dealing with hard disks for over 20 years. EVERY full smart check I've done takes HOURS upon hours, because it checks the entire disk surface, furthermore - frankly, it's a miracle that FreeNAS allows you access to the disk while it's running, generally the disk is completely flat out busy and not in a state to be interrupted, hence my surprise at it still being accessible in the first place.

diskdiddler · Sep 6, 2014

jgreco said:
Scrubs only examine portions of the disk that are in use. So, no, not necessarily.

That definitely makes sense to me based on my understanding, my point is - if a scrub is run and encounters a bad sector, what will the system do from that point?

cyberjock · Sep 6, 2014

Yes, *every* long test I've done on platter based media takes hours and hours.. and on first error it stops and records the error. Why do I know this? Because the log only logs the "first" error. A SMART test doesn't give you a list of bad sectors or anything. /sigh SMART tests are "pass/fail" and pass is only if there are no errors. Once you've found an error the test is a fail, regardless of how many errors you find. Since the long can only hold one entry it simply stores the first error and the test is terminated. Feel free to read the SMART testing criteria from the late 1990s. It explains this in detail....

SSD's fall into two categories:
1. Long tests aren't supported.
2. They literally take about 20 seconds. (hint: They are nothing more than a short test)

It is entertaining to see you talk about that which you don't understand, but think you do. I really wish you'd stop and try to learn a few thing about this stuff. I can tell you are very intelligent, just not well informed on this topic.

Clicking doesn't mean that a SMART test is running.... in fact, you have no way to know what it means because you can't directly access the disk firmware to make the query. But if you are familiar with the "click of death" then it's pretty easy to figure out that it's at least trying to read from bad sectors. But you can't actually determine if it's from a SMART test or not. :p

Edit: And the fact you aren't familiar with querying disk SMART data with smartctl is clear evidence you aren't the pro that you think you might be.

joeschmuck · Sep 6, 2014

diskdiddler said:
That definitely makes sense to me based on my understanding, my point is - if a scrub is run and encounters a bad sector, what will the system do from that point?

It's not that simple. But Cyberjock beat me to the punch, he's a faster at typing than I am.

There is a lot of information you can Google about this stuff vice asking these simple questions. I wish you would do that more.

diskdiddler · Sep 6, 2014

It's difficult to take things you say seriously when generally "the sky is falling" is your attitude, like I said last time I pointed this out, you seem to assume people are running mammoth business production systems with 256gb of ram, 76 drives and gigabit networks. As the developers have started noting in some threads and on the bug tracker, there's going to be mid-range users starting to use this OS more and more as it becomes easier to work with

In one thread, where I pointed out I had attained 80mb/s sustained writes, which dropped to 20mb/s sustained you noted the CPU in my server was weak (it isn't great) and it's probably that, despite evidence it had attained higher speeds earlier. You've insisted without ECC ram the sky will fall, you've told me that anything less than 8gb is sacrilege (yet someone posted the other day about a system which had run for over a year, with TWOgb) You can see my skepticism here.

Regardles, my disk is dead and my data is still good, so that's the important thing. I'll shut her down next week, pull the disk and wait the shitty 4 weeks it'll take to replace in this backwards country with awful retail support and long mailing times if you go direct to manufacturer.
Yes I know ticking isn't going to be 100% indicative of disk activty, or rather, system level disk activity. It could be the same WD spindown weirdness of endlessly parking heads or any other combination of thing. Regardless I have not only a ticking disk but a disk that ticked significantly more today. Actuators need to move, this is why sequential stuff is best while we're still stuck with ghastly mechanical disks

cyberjock · Sep 6, 2014

What works and what is smart are two different things. You just can't see more than black and white. I do NOT insist that without ECC RAM the sky will fall. I'vemade it plenty clear and I'm not going to reiterate what you should already know from reading my thread. If you make the choice to go non-ECC that's your call. I'm only interested in people knowing about the danger that most people won't know about. That's the *only* reason I discuss it at all. Most of us have built home servers with non-ECC for a decade or more (myself included).

I know there's going to be mid-range users. No duh sherlock. I'm one of them and have been for more than 2 years. The fact you are even trying to tell me that I don't know about midrange users despite me being one of them shows you really are clueless.

The developers had noticed this long before FreeNAS and NAS4Free parted ways. Pretty entertaining that you seem to think they've "started noting" as they've known for a very long time.

joeschmuck · Sep 6, 2014

A system running for 2 years on 2GB RAM was obviously using older FreeNAS software which doesn't require the RAM needed to run today's software. You can run the current version of FreeNAS with 4GB RAM on some systems, however many people have issues and they squarely point to the RAM limitation.

ECC RAM is required if you use ZFS, run scrubs, and value your data. I wouldn't say the sky is falling though.

cyberjock · Sep 6, 2014

He's in for a surprise with 9.3..... it's going to give a warning if you don't have 8GB of RAM on bootup... lol. So it's not just "cyberjock being a bitch". Word is out that the requirements need to be better handled and the devs are doing something about it.

And if he had been here 2+ years ago he'd know that one thing that's changed is that we aren't loosing zpools at the same frequency we used to. We've written stickies for many of the mistakes of yester-year like using non-ECC RAM, insufficient RAM, bad choices of hardware, etc to help prevent people from losing data.

Anyone remember when we were losing pools at least once every 2-3 days? We "only" lose about 1 pool a week on average- and every time the server admin had ignored multiple stickies. I'd say that's a massive improvement from 2 years ago..... *because* we're informing the masses of the cause for failures.

In fact, there's only 1 pool I'm aware of where the owner did everything right and still lost a pool. So I'd call that a damn good track record.

joeschmuck · Sep 6, 2014

I know this is off topic, but the 8GB RAM check, are they testing it for something like 7GB or 7.5GB as some system will consume some RAM for onboard video. I know that was my recommendation.

Important Announcement for the TrueNAS Community.

Can't view previous SMART test logs &is it in sequence?

Resident Grinch

Old Man

Old Man

Inactive Account

Old Man

Inactive Account

Wizard

Old Man

Wizard

Inactive Account

Resident Grinch

Wizard

Wizard

Inactive Account

Old Man

Wizard

Inactive Account

Old Man

Inactive Account

Old Man

Similar threads