Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

Set up SMART Reporting via email

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
8,664
@madmax
That is a new one for me. So you say you get an email which states "/etc: Permission denied?

Did you change the script other than the drive identifications? You should post your script so I can see it, just change your email address before submitting the posting. What bothers me is you said it runs fine alone but when you use CRON is still runs but the script generates an email with /etc: Permission denied. I don't see how that is possible with the script I wrote but if you made some modifications. I'll wait for your next posting.
 

madmax

Member
Joined
Aug 31, 2012
Messages
64
@madmax
That is a new one for me. So you say you get an email which states "/etc: Permission denied?

Did you change the script other than the drive identifications? You should post your script so I can see it, just change your email address before submitting the posting. What bothers me is you said it runs fine alone but when you use CRON is still runs but the script generates an email with /etc: Permission denied. I don't see how that is possible with the script I wrote but if you made some modifications. I'll wait for your next posting.

#!/usr/local/bin/sh
#
# Place this in /conf/base/etc/
# Call: sh esmart.sh
(
echo "To: myemailaddresswashere"
echo "Subject: SMART Drive Results for all drives"
echo " "
) > /var/cover
smartctl -i -H -A -n standby -l error /dev/ada0 >> /var/cover
smartctl -i -H -A -n standby -l error /dev/ada1 >> /var/cover
smartctl -i -H -A -n standby -l error /dev/ada2 >> /var/cover
smartctl -i -H -A -n standby -l error /dev/ada3 >> /var/cover
smartctl -i -H -A -n standby -l error /dev/ada4 >> /var/cover
sendmail -t < /var/cover
exit 0


I check the file permissions and rw-r-r and then try doing it with different permissions and same result.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
8,664
Could you post a copy of your email, just remove the email address of course. I want to see the subject line and the message body.

The problem I'm having is trying to understand how or where the message you say you get is getting into the file /var/cover.

You might try this, not sure it will echo properly but it won't hurt... Make sure you are displaying your console messages in the footer of course.

#!/usr/local/bin/sh
#
# Place this in /conf/base/etc/
# Call: sh esmart.sh
(
echo "To: myemailaddresswashere"
echo "Subject: SMART Drive Results for all drives"
echo " "
) > /var/cover
cd /var
echo "Email Header: "$cover
smartctl -i -H -A -n standby -l error /dev/ada0 >> /var/cover
echo "Drive ada0: "$cover
smartctl -i -H -A -n standby -l error /dev/ada1 >> /var/cover
echo "Add Drive ada1: "$cover
smartctl -i -H -A -n standby -l error /dev/ada2 >> /var/cover
echo "Add Drive ada2: "$cover
smartctl -i -H -A -n standby -l error /dev/ada3 >> /var/cover
echo "Add Drive ada3: "$cover
smartctl -i -H -A -n standby -l error /dev/ada4 >> /var/cover
echo "Add Drive ada4: "$cover
sendmail -t < /var/cover
exit 0


Last question... What FreeNAS version are you running? I know this runs fine in 8.0.4 but personally have not tried it with any other version, although I don't know why it wouldn't run. As you said, yours works fine when you run it manually. I'm thinking CRON has problems.

I never asked you how you have CRON setup to run.
Here is my setup...
Capture1.JPG Capture2.JPG
And of course Enabled is checked.

My script is on a USB Flash drive so I don't need to use the boot USB drive but it was tested and runs fine either way.
 

madmax

Member
Joined
Aug 31, 2012
Messages
64
I got it working with CRON and got a good email with all SMART info. So your first post has the CRON setup with Command line as follows and thats what I used.

Command: /etc/sh esmart.sh

Quick question does "sh" mean directory or executable action in this command? I use the command to execute script but this confused me, I thought maybe in a folder called "sh".

But for some reason, I'm not sure why maybe different versions. I'm using FreeNas 8.2 on SSD.

The command that worked for me was:
sh /etc/esmart.sh

This worked without problem.

Thanks Joe for your attention and script.

My next step is try to get a short and long test for all hard drive under one script or at least one CRON job.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
8,664
If you look at post #12 I had it listed better but I'm glad it's working now.

As for a script to do both short and long, that is easy to do and many ways to think about it. If you want to run only one CRON then you need to make the script know the day so it will choose a short or long test on it's own, or you could run two CRON jobs while passing a parameter (switch) that will trigger long or short. You have some examples here to start with, good luck.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I figured I'd post my version since it is different. I am using an Areca RAID controller so I have to get my SMART information from the areca-cli utility included with FreeNAS. Credit to joeschmuck for providing his script from which my script is based on.

I execute a modified esmart.sh(tweaked slightly to make me happy) and call the areca-cli and feed it the necessary commands to get data from all 24 ports. If a port is empty it will reply with an error but will continue to the next port. That way I don't have to keep changing the script if I add/remove/move drives. I also keep my data on the zpool just in case I want it someday.

I have 2 major 'feature' additions for me though:
1. I create a file with the SMART data for all of the drives and then grep Pending to get a printout of just the lines with Current Pending Sector count. Now when I get the email instead of going through 47kbytes of email I can look at the first 28 lines(4 drives on my onboard Intel controller plus 24 from Areca). If they are all zeros then I know things are doing good. Current Pending Sector count is not an end-all-be-all for failing disks, but it is a very good indicator. I had to do this because the areca-cli does not allow for a printout of just the error log for the hard drives.
2. I also grep Temperature to get a printout of all of the drive temps. The areca-cli returns the temps in Fahrenheit even though it says C. I just have to ignore the C and realize that the drives aren't 115C but 115F. :)

Code:
rm /mnt/tank/.SMARTdata/`date +%Y%m%d`
rm /mnt/tank/.SMARTdata/`date +%Y%m%d`b

(
echo "To: ***youremail@goeshere.com***"
echo "Subject: SMART Drive Results for hard drives in ***yourservernamehere***"
echo " "
echo "The following lists the Current Pending Sector Count for all hard drives on the system in order:"
echo " "
) > /mnt/tank/.SMARTdata/`date +%Y%m%d`

smartctl -a /dev/ada0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a /dev/ada1 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a /dev/ada2 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a /dev/ada3 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
areca-cli < /mnt/tank/.SMARTdata/areca >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b

cat /mnt/tank/.SMARTdata/`date +%Y%m%d`b | grep Pending >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo " " >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo "The following lists the current temperatures for all hard drives on the system in order:" >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo " " >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
cat /mnt/tank/.SMARTdata/`date +%Y%m%d`b | grep "194 Temperature" >> /mnt/tank/.SMARTdata/`date +%Y%m%d`

echo " " >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo "The following is the long printout of all SMART data for all hard drives on the system in order:" >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo " " >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
cat /mnt/tank/.SMARTdata/`date +%Y%m%d`b >> /mnt/tank/.SMARTdata/`date +%Y%m%d`

rm /mnt/tank/.SMARTdata/`date +%Y%m%d`b
sendmail -t < /mnt/tank/.SMARTdata/`date +%Y%m%d`
exit 0



My /mnt/tank/.SMARTdata/areca (the commands that are run inside the areca-cli) are:

Code:
set password=***yourRAIDcontrollerpasswordhere***
disk info drv=1
disk smart drv=1
disk info drv=2
disk smart drv=2
disk info drv=3
disk smart drv=3
disk info drv=4
disk smart drv=4
disk info drv=5
disk smart drv=5
disk info drv=6
disk smart drv=6
disk info drv=7
disk smart drv=7
disk info drv=8
disk smart drv=8
disk info drv=9
disk smart drv=9
disk info drv=10
disk smart drv=10
disk info drv=11
disk smart drv=11
disk info drv=12
disk smart drv=12
disk info drv=13
disk smart drv=13
disk info drv=14
disk smart drv=14
disk info drv=15
disk smart drv=15
disk info drv=16
disk smart drv=16
disk info drv=17
disk smart drv=17
disk info drv=18
disk smart drv=18
disk info drv=19
disk smart drv=19
disk info drv=20
disk smart drv=20
disk info drv=21
disk smart drv=21
disk info drv=22
disk smart drv=22
disk info drv=23
disk smart drv=23
disk info drv=24
disk smart drv=24
hw info
exit


The email I get looks like this:
Code:
The following lists the Current Pending Sector Count for all hard drives on the system in order:
 
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
 
The following lists the current temperatures for all hard drives on the system in order:
 
194 Temperature_Celsius     0x0022   113   103   000    Old_age   Always       -       39
194 Temperature_Celsius     0x0022   112   102   000    Old_age   Always       -       40
194 Temperature_Celsius     0x0022   114   102   000    Old_age   Always       -       38
194 Temperature_Celsius     0x0022   113   101   000    Old_age   Always       -       39
194 Temperature                               0x22     111      0  OK          
194 Temperature                               0x22     110      0  OK          
194 Temperature                               0x22     111      0  OK          
194 Temperature                               0x22     112      0  OK          
194 Temperature                               0x22     111      0  OK          
194 Temperature                               0x22     111      0  OK          
194 Temperature                               0x22     110      0  OK          
194 Temperature                               0x22     112      0  OK          
194 Temperature                               0x22     113      0  OK          
194 Temperature                               0x22     111      0  OK          
194 Temperature                               0x22     113      0  OK          
194 Temperature                               0x22     115      0  OK          
194 Temperature                               0x22     115      0  OK          
194 Temperature                               0x22     115      0  OK          
 
The following is the long printout of all SMART data for all hard drives on the system in order:

Then a complete printout of all of the drive SMART info is attached so I can match the bad temp/pending sector count to the /dev and serial number. 
 

docthomas

Junior Member
Joined
Jan 24, 2013
Messages
13
This is a simple way to get SMART monitoring to report daily the status of your hard drives.

The purpose of this is to have all your SMART enabled drives report how they are doing daily. I wanted to see how many times the drives spun up and down, see if there were any flaky transmission errors because I had a bad SATA cable and it was detected through SMART.

This code is not persistent between upgrades of FreeNAS and must be placed back on the boot drive. You can run this from one of your hard drives however it will force them to spin up each time and you may not desire that.

This is a very simple script and implementing it will take very little time.

First the basic instructions:
NOTE: Do not type the single quotes, they are simply around the text to type. And it is assumed you already have FreeNAS email all set up. Sendmail will fail if it is not set up.

1) SSH or use the console and log in as root/SU.

2) Type 'mount -wu /'

3) Type 'cd /conf/base/etc'

4) Type 'ee esmart.sh'

5) Cut and paste the script below into the editor.

Here is the simple script:
Code:
#!/usr/local/bin/sh
#
# Place this in /conf/base/etc/
# Call: sh esmart.sh /dev/ada0
switch1=$1
(
echo "To: YourEmail@Address.net"
echo "Subject: SMART Drive Results for ${switch1}"
echo " "
) > /var/cover
smartctl -i -H -A -n standby -l error ${switch1} >> /var/cover
sendmail -t < /var/cover
exit 0

# Set idle mode to so it doesn't spin up.
# Options -n standby = Will not let the drive spin up if it's not currently spinning.  This means that no data will be present if the drive is not running because it exits out with an error condition.  This is nice for those folks who like to use HDD Standby in FreeNAS.
# -i = Device Info (Does not force a drive spinup)
# -H = Device Health (Forces spinup)
# -A = Only Vendor specific SMART attributes (Forces spinup)
# -l error = SMART Error Log (Forces spinup)


Here is the more complex script but it brings something extra (not explained like the basic code is in the below text but you should be able to understand it). In the previous script if the drive is in standby you will get a report that doesn't tell you much because the drive is not running. In this script it will periodically poll the hard drive to see if it's out of standby and then generate the report plus it cleans up the report some and more importantly you can run it on all the drives at once (same time period) where as the previous script you could only run a CRON job on one drive, wait a minute and run another CRON job. This is by far the better script of the two.
Code:
#!/usr/local/bin/sh
#
# Place this in /conf/base/etc/
# Call: sh esmart.sh /dev/ada0
# switch1 is the drive to check (passed parameter)
switch1=$1

# This will use the characters after "/dev/" for the temp file names.
# Example: /dev/ada0 becomes coverada0 or cover0ada0 or cover1ada0
# This needs to be done to keep multiple jobs from using the same files.
drv=`echo $switch1 | cut -c6-`

# Variable just so we can add a note that the drive was asleep when the
# application started but is now awake.
c=0

# Process to run our check on the drive
chkdrive()
{
smartctl -H -n standby -l error ${switch1} > /var/cover0${drv}
}

(
echo "To: youremail@address.net"
echo "Subject: SMART Drive Results for ${switch1}"
echo " "
) > /var/cover${drv}
chkdrive
while [ $? != "0" ]
do
# Pause the checking of the drive to about once a minute if the drive is not running.
  sleep 59
  c=1
  chkdrive
done

if [ $c -eq 1 ]
 then
 echo "THE DRIVE WAS ASLEEP AND JUST WOKE UP" >> /var/cover${drv}
fi

# These lines remove the automatic Branding lines
sed -e '1d' /var/cover0${drv} > /var/cover1${drv}
sed -e '1d' /var/cover1${drv} > /var/cover0${drv}
sed -e '1d' /var/cover0${drv} > /var/cover1${drv}
sed -e '1d' /var/cover1${drv} > /var/cover0${drv}

cat /var/cover0${drv} >> /var/cover${drv}
sendmail -t < /var/cover${drv}

# Cleanup our trash
rm /var/cover${drv}
rm /var/cover0${drv}
rm /var/cover1${drv}
exit 0

# Set idle mode to so it doesn't spin up.
# Options
# -n standby (Remove this to force a spinup)
# -i = Device Info
# -H = Device Health
# -A = Only Vendor specific SMART attributes
# -l error = SMART Error Log


6) Edit youremail@address.net to reflect the email address you desire the report to be sent.

7) You can edit more of the script if you like but you can stop here and lets save this script, Press Escape and save it.

8) Test Run the script by typing 'sh esmart.sh /dev/ada0' NOTE: /dev/ada0 needs to be changed to your drive path. Depending on the drive adapter it could be different.

9) Wait a few seconds and check your email.

10) If it worked then lets copy it to /etc so you can run it now without having to reboot. Type:

Code:
cd /etc
cp /conf/base/etc/esmart.sh .


11) Type 'mount -ro /'

Now let me explain this script so you can make changes to it if you like.

The first few lines define this as a script and tell you how to call this script.
Next we assign variable switch1 for use in the commands.
Next is the code within the parentheses which define the email header and creates the file cover in the RAM based area /var to keep the hard drives from being accessed.

Code:
#!/usr/local/bin/sh
#
# Place this in /conf/base/etc/
# Call: sh esmart.sh /dev/ada0
# $1 is the command line variable /dev/ada0 in this example
switch1=$1
(
echo "To: YourEmail@Address.net"
echo "Subject: SMART Drive Results for ${switch1}"
echo " "
) > /var/cover



The key line in the script is:
Code:
smartctl -i -H -A -n standby -l error ${switch1} >> /var/cover

This does all the real work and it will collect the data on the drive specified in switch1. The results will be added to the file cover. If the drive is not spinning then you will get an email stating the drive is in Standby and it exited. You get no data. This is fine for most people who want to minimize spinning up thier hard drives but if you really want the data everytime you run this script, remove the '-n standby' portion.

This section takes the file we created called cover and sends it via the sendmail application.
Code:
sendmail -t < /var/cover
exit 0


Now we will create a CRON so this script runs once a day, you can chose how often you want it to run.
12) Open FreeNAS GUI

13) On left side window click on System, Cron Jobs, Add Cron Job

14) Use the following settings: (You may set the time intervals to whatever you desire)
User: root
Command: /etc/sh esmart.sh /dev/ada0
Description: ada0 SMART Results
Minute: Each selected minute: 01
Hour: Each selected hour: 01 (checking at 1 AM)
Day of month: Every N day of month: 1
Leave the rest at the default of all checked and click OK.

15) Add additional Cron Jobs for each additional drive you have.

16) Sit back and watch the reports come in.

If you would rather just have one command to check all the drives, here is a script to check 4 drives and you can modify it as you see fit.

Code:
#!/usr/local/bin/sh
#
# Place this in /conf/base/etc/
# Call: sh esmart.sh
(
echo "To: YourEmail@Address.net"
echo "Subject: SMART Drive Results for all drives"
echo " "
) > /var/cover
smartctl -i -H -A -n standby -l error /dev/ada0 >> /var/cover
smartctl -i -H -A -n standby -l error /dev/ada1 >> /var/cover
smartctl -i -H -A -n standby -l error /dev/ada2 >> /var/cover
smartctl -i -H -A -n standby -l error /dev/ada3 >> /var/cover
sendmail -t < /var/cover
exit 0

# Set idle mode to so it doesn't spin up.
# Options -n standby
# -i = Device Info
# -H = Device Health
# -A = Only Vendor specific SMART attributes
# -l error = SMART Error Log

Again, the '-n standby' will cause an issue at the point where a drive not spinning is encountered. Since I have a single pool of 4 drives, should my first drive exit due to not spinning, I can safely assume my other drives are not spinning either since I have the same HDD Standby (in FreeNAS GUI) settings for each.

-Mark
Great information here!
I have a question about the last script as part of the Cron job. my assumption is the cmd syntex in the cron job would be update to:
/etc/sh scriptname.sh
Is that correct? Since the script for all drives calls esmart.sh the cron job should just contain the new scriptname without the /dev/ada0?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
8,664
I'm not following what your question is but if we are only talking about the last script, you can save it as any file name, I used esmart.sh in my example, and just place it in the location specified so it survives a reboot, and add it into the CRON jobs. As it is written it will check drives ada0, ada1, ada2, and ada3. If you do not have these drives (possible due to different physical connection normally) or if you need to add or delete drives, just adjust the script accordingly. Of course ensure you update your email address in the script too.

I hope that helps.

-Mark
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
8,664
I figured I'd post my version since it is different. I am using an Areca RAID controller so I have to get my SMART information from the areca-cli utility included with FreeNAS. Credit to joeschmuck for providing his script from which my script is based on.
Thanks for the nice words and I welcome others scripts because my view is right for me but maybe not everyone else. And I'm going to take a hard look at your scripts to see how they can benefit me and I'm sure it will give me other ideas. I appreciate the time and effort many people like CyberJock and ProtoSD put into this long and dragging on forever project. Your efforts have made some nights more bearable.

-Mark

EDIT: I tested the script and I like it. I have a curios question, why do you search for "194 Temperature" vice just Temperature? Was there something specific with one of your drives listing more than one Temperature or maybe your card threw something additional at it? I'm sure there is a reason. And wouldn't it be nice if we could take certain values and test to see if they have exceeded a limit and then change the email subject line to something that would draw out immediate attention, like "WARNING DRIVE FAILURE". I'm not big into scripting, it's a learn as I go thing since Linux/FreeBSD isn't what I use daily at work, so I may work on something like that.
 

bertrem

Junior Member
Joined
Feb 19, 2013
Messages
15
I guess I might as well throw my hat in the ring too :D

Features of my particular script:
  • command line parses arbitrary number of devices and will iterate through all of them.
  • if more than 2 devices are specified, a high-level summary report will also be sent in addition to the more detailed reports sent for each drive
  • reports time elapsed before drive became available (caveat: this is untested)

It should be usable as-is except for the need to specify your own email address and machine name on lines 6 and 7.

This script will not actually run the tests for you. It merely reports the results. It assumes the tests are being periodically kicked off by smartd.

Usage: sh smartmail.sh <drive0> [drive1 [drive2...driveN]]
where driveX is relative to /dev, e.g. ada0

Example: sh smartmail.sh ada0 ada2 ada5 ada7

Code:
#!/bin/sh

# check SMART drive status and mail results to given address

# parameters
email=<insert your email address here>
machine=<insert name of your machine>

# check usage
usage()
{
   echo 'Usage: sh smartmail.sh <drive0> [drive1 [drive2...driveN]]'
   echo 'where <driveN> is the name of a device in /dev, e.g. ada0'
}

if [ $# -eq 0 ]
then
   usage
   exit 1
fi

# send summary report if more than 2 drives are involved
summarize=0
if [ $# -gt 2 ]
then
   summarize=1
fi

# specify process to check the drive
# -n standby  : skip if on standby (use 'never' to force a spinup)
# -H          : show overall health (must be included for summary!)
# -A          : show vendor-specific SMART attributes
# -l error    : show SMART error log
# -l selftest : show SMART test log
chkdrive()
{
   smartctl -n standby -H -A -l error -l selftest ${drivepath} >> ${logfile}
}

if [ ${summarize} -eq 1 ]
then
   logfile_summary=/tmp/smlog_summary
   # emit email header
   (
      echo "To: ${email}"
      echo "Subject: SMART Drive Status Summary for ${machine}"
      echo " "
      echo "SMART overall-health self-assessment test results"
   ) > ${logfile_summary}
fi

# iterate through all drives
for drive in "$@"
do
   drivepath=/dev/${drive}
   logfile=/tmp/smlog_${drive}

   # emit email header
   (
      echo "To: ${email}"
      echo "Subject: SMART Drive Status for ${machine}:${drivepath}"
      echo " "
   ) > ${logfile}

   sleepcount=0

   # check on the drive repeatedly until its awake
   chkdrive
   while [ $? != "0" ]
   do
      sleep 60
      sleepcount=`expr ${sleepcount} + 1`
      chkdrive
   done

   if [ ${sleepcount} -gt 0 ]
   then
      echo " "
      echo "DRIVE WAS ASLEEP FOR ROUGHLY ${sleepcount} MINUTES BEFORE STATUS WAS AVAILABLE" >> ${logfile}
   fi

   # extract summary line if desired
   if [ ${summarize} -eq 1 ]
   then
      status=`awk '/overall/' ${logfile} | cut -d: -f2`
      echo "${drivepath}:${status}" >> ${logfile_summary}
   fi

   # remove some gratuitous lines from the file
   sed -i '' -e '/Copyright/d' ${logfile}
   sed -i '' -e '/=== START/d' ${logfile}
done

# send the summary first...
if [ ${summarize} -eq 1 ]
then
   sendmail -t < ${logfile_summary}
   rm ${logfile_summary}
fi

# ...then send individual drive status
for drive in "$@"
do
   logfile=/tmp/smlog_${drive}
   sendmail -t < ${logfile}
   rm ${logfile}
done

exit 0
 

SkyMonkey

Member
Joined
Mar 13, 2013
Messages
102
Thanks for this, I've generally got it working. I also included CPU temp reporting in mine, as well as an idle checking script (to look at the state of the drives before and after the SMART commands). I did the idle check to see if the drives spin up when not including -n standby, which they don't (WD 2TB Reds). I have some improvements related to formatting the resulting email that I'd like to figure out to clean things up, and maybe put a summary at the top, but that will wait for now.

One thing though: Is the command in the opening post to restore the file system to read only in step 11 ('mount -ro /') correct? It didn't work, and on reading the man page, I seem to think that either 'mount -o ro /' or the equivalent 'mount -r /' might be correct command? It seemed to work for me, but I am rather a newbie still.
 

bertrem

Junior Member
Joined
Feb 19, 2013
Messages
15
One thing though: Is the command in the opening post to restore the file system to read only in step 11 ('mount -ro /') correct? It didn't work, and on reading the man page, I seem to think that either 'mount -o ro /' or the equivalent 'mount -r /' might be correct command?
agreed, it's wrong. i use 'mount -o ro /'
 

paleoN

Neophyte Sage
Joined
Apr 22, 2012
Messages
1,403

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
8,664
I'm curious why step 11 doesn't work as written for you. It has always worked for me and many others but I agree it is not consistent with the manual. I will change it to 'mount -r /' in the step above. Using "-r" is the same as using "-o ro" so I'd rather pick less characters to type in.

Thanks for the feedback.

-Mark

EDIT: I was thinking that maybe when typing in the command that since -r has no additional parameters that the "o" was truncated by the shell. Eh, doesn't matter now, the instructions have been corrected.
 

Wolfeman0101

Senior Member
Joined
Jun 14, 2012
Messages
404
I guess I might as well throw my hat in the ring too :D

Features of my particular script:
  • command line parses arbitrary number of devices and will iterate through all of them.
  • if more than 2 devices are specified, a high-level summary report will also be sent in addition to the more detailed reports sent for each drive
  • reports time elapsed before drive became available (caveat: this is untested)

It should be usable as-is except for the need to specify your own email address and machine name on lines 6 and 7.

This script will not actually run the tests for you. It merely reports the results. It assumes the tests are being periodically kicked off by smartd.

Usage: sh smartmail.sh <drive0> [drive1 [drive2...driveN]]
where driveX is relative to /dev, e.g. ada0

Example: sh smartmail.sh ada0 ada2 ada5 ada7

Code:
#!/bin/sh

# check SMART drive status and mail results to given address

# parameters
email=<insert your email address here>
machine=<insert name of your machine>

# check usage
usage()
{
   echo 'Usage: sh smartmail.sh <drive0> [drive1 [drive2...driveN]]'
   echo 'where <driveN> is the name of a device in /dev, e.g. ada0'
}

if [ $# -eq 0 ]
then
   usage
   exit 1
fi

# send summary report if more than 2 drives are involved
summarize=0
if [ $# -gt 2 ]
then
   summarize=1
fi

# specify process to check the drive
# -n standby  : skip if on standby (use 'never' to force a spinup)
# -H          : show overall health (must be included for summary!)
# -A          : show vendor-specific SMART attributes
# -l error    : show SMART error log
# -l selftest : show SMART test log
chkdrive()
{
   smartctl -n standby -H -A -l error -l selftest ${drivepath} >> ${logfile}
}

if [ ${summarize} -eq 1 ]
then
   logfile_summary=/tmp/smlog_summary
   # emit email header
   (
      echo "To: ${email}"
      echo "Subject: SMART Drive Status Summary for ${machine}"
      echo " "
      echo "SMART overall-health self-assessment test results"
   ) > ${logfile_summary}
fi

# iterate through all drives
for drive in "$@"
do
   drivepath=/dev/${drive}
   logfile=/tmp/smlog_${drive}

   # emit email header
   (
      echo "To: ${email}"
      echo "Subject: SMART Drive Status for ${machine}:${drivepath}"
      echo " "
   ) > ${logfile}

   sleepcount=0

   # check on the drive repeatedly until its awake
   chkdrive
   while [ $? != "0" ]
   do
      sleep 60
      sleepcount=`expr ${sleepcount} + 1`
      chkdrive
   done

   if [ ${sleepcount} -gt 0 ]
   then
      echo " "
      echo "DRIVE WAS ASLEEP FOR ROUGHLY ${sleepcount} MINUTES BEFORE STATUS WAS AVAILABLE" >> ${logfile}
   fi

   # extract summary line if desired
   if [ ${summarize} -eq 1 ]
   then
      status=`awk '/overall/' ${logfile} | cut -d: -f2`
      echo "${drivepath}:${status}" >> ${logfile_summary}
   fi

   # remove some gratuitous lines from the file
   sed -i '' -e '/Copyright/d' ${logfile}
   sed -i '' -e '/=== START/d' ${logfile}
done

# send the summary first...
if [ ${summarize} -eq 1 ]
then
   sendmail -t < ${logfile_summary}
   rm ${logfile_summary}
fi

# ...then send individual drive status
for drive in "$@"
do
   logfile=/tmp/smlog_${drive}
   sendmail -t < ${logfile}
   rm ${logfile}
done

exit 0
Using your script by using 'sh smartmail.sh ada0 ada1 ada2 ada3 ada4 ada5' but when I leave ada4 and ada5 in it will never finish. When I try 'sh smartmail.sh ada4' or 'sh smartmail.sh ada5' the same thing happens. Any tips?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Stupid question, but are you SURE your drives are ada4 and ada5?


Are 2 drives on a different SATA controller? If so the controller may not allow you to query SMART data with the same command parameters or at all.
 

Wolfeman0101

Senior Member
Joined
Jun 14, 2012
Messages
404
Stupid question, but are you SURE your drives are ada4 and ada5?


Are 2 drives on a different SATA controller? If so the controller may not allow you to query SMART data with the same command parameters or at all.
I'm not sure if they are on different controllers, how could I check? They are ada4 and ada5.

Disks.jpg
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Top