smartd not logging?

Status
Not open for further replies.

pjc

Contributor
Joined
Aug 26, 2014
Messages
187
I have one drive that tends to run a bit warm (idle at 37, peak of about 42), and so I wanted to log its temperature over the long haul, longer than "smartctl -l scttemp" will do.

I tried configuring SMART to check every minute, never pay attention to the power mode, and log an informational message when the temperature is more than 1 degree.

I see smartd stop and restart:

Code:
Sep 20 08:54:40 freenas notifier: Stopping smartd.
Sep 20 08:54:40 freenas notifier: Waiting for PIDS: 5243.
Sep 20 08:54:40 freenas notifier: smartd not running? (check /var/run/smartd.pid
).
Sep 20 08:54:40 freenas notifier: Starting smartd.


But I don't see anything being logged anywhere.

/var/run/smartd.pid has the right process number.
ps shows "/usr/local/sbin/smartd -i 60 -c /usr/local/etc/smartd.conf".
smartd.conf shows:
Code:
################################################
# smartd.conf generated by /etc/rc.d/ix-smartd
################################################
/dev/da0 -a -n never -W 0,1,0
/dev/da1 -a -n never -W 0,1,0


So everything looks like it's running and configured properly. I just don't see any logging output.

Any ideas?
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479

pjc

Contributor
Joined
Aug 26, 2014
Messages
187
I think this post might help you out.
I'm not seeing the connection. I'm asking about disk temperature, not CPU temperature.

There's a UI for configuring the kind of monitoring I'm trying to do, but it's not working.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
I thought you might script a cron job for your HDD temperature monitoring needs
with the logging that you seem to need. Sorry if that didn't help you:oops:
 

pjc

Contributor
Joined
Aug 26, 2014
Messages
187
Got it. Thanks for the thoughtful responses!

So my question is: why isn't smartd doing what it seems like it's supposed to do as configured?

(I've seen something in the smartd documentation that talks about starting tests 30 minutes after smartd is launched, but that's not the issue: there aren't any smartd log messages for hours.)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
First, you need to understand what those 3 temps mean. Please Google differential, critical and informational. They likely don't mean what you think they mean. I find them to be a little backwards myself. I'd expect critical to be of a higher priority than informational, but it's actually the reverse.

Second, smartd doesn't log temps except in certain circumstances. Again, informational is all that is logged IIRC. Critical can, but only for some situations. It's kind of bizarre.

Third, the temps only trigger when you go above the set temp. So if you set your informational temp to 40C and the drive is 35C and heats up to 42C over an hour (time frame is irrelevant) you'll get a message (if set for the proper temp monitoring category) when you first exceed the temp. Then you'll never get another warning unless and until it goes below and then exceeds it again. If it heats up to 42C today and stays above your 40C limit for the next 10 years you'll never get another warning. Do note that on first bootup (and starting of the smartd service) that the temperature is considered to be 0C until the first check. So if you restart your system and it ends up above the setpoint right off the bat you'll get an email. Just don't do what I did and think you'll set it to email you when any disk goes above something like 10C. You'll get an email for every drive on bootup. If you are extra silly and do what I do and send it to a text message on your cell phone it'll start going berserk for 10 minutes while they all come in. ;)

I would highly recommend that you read up on how smartd works and what all the crap means. It's not completely intuitive (and if this is your first time actually using smartd you'll understand what I mean when you read up on it). I spent like 12 hours reading documentation and testing before I really understood 'the bigger picture' with smartd. It can be confusing as all hell until it clicks in your head. Then you'll be a pro and you'll laugh at how dumb you felt when you created the thread. Don't take offense, I don't know anyone that got smartd right the first time without some growing pains. Just don't look at some of my really old posts from 2012... they are embarrassing because I looked so stupid too. ;)
 

pjc

Contributor
Joined
Aug 26, 2014
Messages
187
They likely don't mean what you think they mean. I find them to be a little backwards myself. I'd expect critical to be of a higher priority than informational, but it's actually the reverse.
What do you mean informational is higher priority? Looking at syslog.h and the smartd source code, LOG_CRIT is higher priority than LOG_INFO. Also, only critical temperatures issue emails.

Second, smartd doesn't log temps except in certain circumstances. Again, informational is all that is logged IIRC. Critical can, but only for some situations. It's kind of bizarre.
That's not quite what I'm seeing in the code. At each interval:
  • Differential: if the temperature changes by +-(differential) since the last time a difference was logged (or the min/max temperature has changed), send an informational log to syslog.
  • Critical: if the temperature has exceeded the critical temperature, send a critical log to syslog. If a critical email hasn't already been sent (and a mailing address is configured), send one. If the temperature then drops below the informational limit (if there is one) or critical-5 (if there isn't), send another email advising of the change, and reset the email counter.
  • Informational: if the temperature isn't critical, but is >= the informational temperature, send an informational log to syslog.

Third, the temps only trigger when you go above the set temp. So if you set your informational temp to 40C and the drive is 35C and heats up to 42C over an hour (time frame is irrelevant) you'll get a message (if set for the proper temp monitoring category) when you first exceed the temp. Then you'll never get another warning unless and until it goes below and then exceeds it again.
At your recommendation I reviewed the source code for smartd, and what you say is accurate for the email notification.

But for logging to syslog, it should happen at every interval. I'm OK with the system log being noisy (for now anyway). It won't overpower my inbox or my SMS messages.

So why am I not seeing any logging? I was "extra silly" and asked it to log every time the drive was about 1 degree C. That ought to be every polling interval. But it isn't.

For an added bonus, "killall -USR1 smartd" isn't even logging "Signal USR1 - checking devices now rather than in %d seconds.\n".
 
Last edited:

pjc

Contributor
Joined
Aug 26, 2014
Messages
187
For an added bonus, "killall -USR1 smartd" isn't even logging "Signal USR1 - checking devices now rather than in %d seconds.\n".
Aha, this was the key, along with the following hint from the smartd man page:
Under Solaris with the default /etc/syslog.conf configuration, messages below loglevel LOG_NOTICE will not be recorded. Hence all smartd messages with loglevel LOG_INFO will be lost.
FreeNAS isn't Solaris, but it turns out to be configured the same way: syslog by default drops anything less important than LOG_NOTICE, so LOG_INFO messages aren't getting logged.

It seem like FreeNAS should be logging smartd's INFO messages if it's going to bother giving us the option of setting a differential temperature (which only ever logs anything at the INFO level). This is slightly more complicated than it sounds, because if you just allow all INFO messages from all services, things get a bit noisy.

On the plus side, fixing it the right way would let them put smartd messages in their own smartd.log file too.

But in the mean time, for my purposes, changing the "*.notice" to "*.info" in /etc/syslog.conf and then "killall -HUP syslog" to reload the config does the trick.

...and interestingly my "hot" drive reports a ludicrously high airflow temperature (63 degrees) in addition to the warm-but-sane operating temperature of 37.
 

pjc

Contributor
Joined
Aug 26, 2014
Messages
187
Strange footnote to the crazy temperature: the "raw" value for Airflow_Temperature_Cel reported by smartctl matches the Temperature_Celsius exactly. For some reason that's getting converted to a high number in inverse proportion: 63 corresponds to 37 degrees, 62 corresponds to 38 degrees, and 69 corresponds to 31 degrees.

Probably just buggy Seagate firmware (it's an old 500GB ES drive).
 
Status
Not open for further replies.
Top