Set up SMART Reporting via email

joeschmuck · Mar 26, 2013

Does this help...

I suspect your drives are sleeping.

try typing

Code:

smartctl -n standby -H -A -l error -l selftest /dev/ada5

what happens? If it fails to produce results then try

Code:

smartctl -A /dev/ada5

And if that produces results try the first one again. If that works then your drives were sleeping and there is something going on there. you may need to customize the script for your hardware. If you can't figure it out, how about posting which drives you have installed (Make/Model) as I'd think the ada4/5 may be different?

Wolfeman0101 · Mar 26, 2013

joeschmuck said:
Does this help...

I suspect your drives are sleeping.

try typing

Code:
smartctl -n standby -H -A -l error -l selftest /dev/ada5

what happens? If it fails to produce results then try

Code:
smartctl -A /dev/ada5

And if that produces results try the first one again. If that works then your drives were sleeping and there is something going on there. you may need to customize the script for your hardware. If you can't figure it out, how about posting which drives you have installed (Make/Model) as I'd think the ada4/5 may be different?

Code:

[root@nibbler] /mnt/Vol1# smartctl -n standby -H -A -l error -l selftest /dev/ada5
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

Device is in STANDBY mode, exit(2)

Code:

[root@nibbler] /mnt/Vol1# smartctl -A /dev/ada5
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       5
  3 Spin_Up_Time            0x0027   176   168   021    Pre-fail  Always       -       6200
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       89
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   092   092   000    Old_age   Always       -       6050
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       85
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       80
193 Load_Cycle_Count        0x0032   195   195   000    Old_age   Always       -       15698
194 Temperature_Celsius     0x0022   119   101   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       2
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       2

BossyBear · Mar 28, 2013

Moving to the proper topic... Apologies.

joeschmuck · Mar 28, 2013

dsiminiuk said:
I hate to divert the attention back to email issues but I would like SMART to send also. However...

You have an SMTP server so you should start a new thread vice troubleshooting it here. That is outside the scope of this thread.

- - - Updated - - -

@Wolfeman0101

So your drives are sleeping and the script did not fall thru like it should have. Could you please post your script again, don't copy what you had above. I just want to ensure there isn't something wrong like a typo or missed line.

-Mark

BossyBear · Mar 28, 2013

The only reason I posted here was because of this post http://forums.freenas.org/archive/index.php/t-6929.html

I'll post new in the proper place. Thanks

Wolfeman0101 · Mar 28, 2013

joeschmuck said:
You have an SMTP server so you should start a new thread vice troubleshooting it here. That is outside the scope of this thread.

- - - Updated - - -

@Wolfeman0101

So your drives are sleeping and the script did not fall thru like it should have. Could you please post your script again, don't copy what you had above. I just want to ensure there isn't something wrong like a typo or missed line.

-Mark

Code:

sh smartmail.sh ada0 ada1 ada2 ada3 ada4 ada5

Code:

#!/bin/sh

# check SMART drive status and mail results to given address

# parameters
email=****@gmail.com
machine=Nibbler

# check usage
usage()
{
   echo 'Usage: sh smartmail.sh <drive0> [drive1 [drive2...driveN]]'
   echo 'where <driveN> is the name of a device in /dev, e.g. ada0'
}

if [ $# -eq 0 ]
then
   usage
   exit 1
fi

# send summary report if more than 2 drives are involved
summarize=0
if [ $# -gt 2 ]
then
   summarize=1
fi

# specify process to check the drive
# -n standby  : skip if on standby (use 'never' to force a spinup)
# -H          : show overall health (must be included for summary!)
# -A          : show vendor-specific SMART attributes
# -l error    : show SMART error log
# -l selftest : show SMART test log
chkdrive()
{
   smartctl -n standby -H -A -l error -l selftest ${drivepath} >> ${logfile}
}

if [ ${summarize} -eq 1 ]
then
   logfile_summary=/tmp/smlog_summary
   # emit email header
   (
      echo "To: ${email}"
      echo "Subject: SMART Drive Status Summary for ${machine}"
      echo " "
      echo "SMART overall-health self-assessment test results"
   ) > ${logfile_summary}
fi

# iterate through all drives
for drive in "$@"
do
   drivepath=/dev/${drive}
   logfile=/tmp/smlog_${drive}

   # emit email header
   (
      echo "To: ${email}"
      echo "Subject: SMART Drive Status for ${machine}:${drivepath}"
      echo " "
   ) > ${logfile}

   sleepcount=0

   # check on the drive repeatedly until its awake
   chkdrive
   while [ $? != "0" ]
   do
      sleep 60
      sleepcount=`expr ${sleepcount} + 1`
      chkdrive
   done

   if [ ${sleepcount} -gt 0 ]
   then
      echo " "
      echo "DRIVE WAS ASLEEP FOR ROUGHLY ${sleepcount} MINUTES BEFORE STATUS WAS AVAILABLE" >> ${logfile}
   fi

   # extract summary line if desired
   if [ ${summarize} -eq 1 ]
   then
      status=`awk '/overall/' ${logfile} | cut -d: -f2`
      echo "${drivepath}:${status}" >> ${logfile_summary}
   fi

   # remove some gratuitous lines from the file
   sed -i '' -e '/Copyright/d' ${logfile}
   sed -i '' -e '/=== START/d' ${logfile}
done

# send the summary first...
if [ ${summarize} -eq 1 ]
then
   sendmail -t < ${logfile_summary}
   rm ${logfile_summary}
fi

# ...then send individual drive status
for drive in "$@"
do
   logfile=/tmp/smlog_${drive}
   sendmail -t < ${logfile}
   rm ${logfile}
done

exit 0

joeschmuck · Mar 30, 2013

Okay, it appears the problem is that this code will wait forever for a drive that is sleeping but it was designed this way as to not needlessly spin-up a drive. If it never wakes up then you will never get a report.

The problem you are experiencing is in section

Code:

    sleepcount=0
# check on the drive repeatedly until its awake
chkdrive
while [ $? != "0" ]
   do
   sleep 60
   sleepcount=`expr ${sleepcount} + 1`
   chkdrive    
done

My advice is if you want to use this code as-is, then schedule it to start just before you know your drives will be active, like a routine backup or don't include ada4 and ada5. You can launch 2 instances of this, one with the first group of drives, the second with the second group of drives.

So ada4 and ada5 must not be part of the same pool as ada0, ada1, ada2, or ada3.

Wolfeman0101 · Apr 1, 2013

joeschmuck said:
Okay, it appears the problem is that this code will wait forever for a drive that is sleeping but it was designed this way as to not needlessly spin-up a drive. If it never wakes up then you will never get a report.

The problem you are experiencing is in section

Code:
sleepcount=0 # check on the drive repeatedly until its awake chkdrive while [ $? != "0" ] do sleep 60 sleepcount=`expr ${sleepcount} + 1` chkdrive done

My advice is if you want to use this code as-is, then schedule it to start just before you know your drives will be active, like a routine backup or don't include ada4 and ada5. You can launch 2 instances of this, one with the first group of drives, the second with the second group of drives.

So ada4 and ada5 must not be part of the same pool as ada0, ada1, ada2, or ada3.

All 6 drives are in the same ZPOOL. I'm thinking I should find an internal 6 port SATA card and move all the drives off the motherboard.

joeschmuck · Apr 2, 2013

You are missing the point all together. If you did exactly as I said in posting #61 above and that includes the last two sentences and that last item didn't report the drives as sleeping (to which you indicated in posting #62 that it worked), then your problem has nothing to do with your controller, your pool, or anything other than your last two drives are sleeping even though you have FreeNAS set up to disable sleeping. This could be a setting for the drives in the firmware or have you looked in the BIOS settings to see if you have a sleep setting for the drives?

Do this for me and post the results please, I want to see what the drive data is, maybe it can sort this out:

Code:

smartctl -a /dev/ada0
smartctl -a /dev/ada5

And here is the fix if you know your drives are spinning all the time and you have no intention of them sleeping:

Locate the code for chkdrive and remove the "-n standby" from the smartctl line as indicated below...

Code:


chkdrive()
 {
    smartctl -H -A -l error -l selftest ${drivepath} >> ${logfile}
 }

By removing this section of code (as indicated within the code notes) it will not wait for the drive to come out of sleep/standby and if the drive is actually sleeping, it will spin up.

There is no need to change your controller or uproot your drives. Let me know if the code change works.

Wolfeman0101 · Apr 2, 2013

joeschmuck said:
You are missing the point all together. If you did exactly as I said in posting #61 above and that includes the last two sentences and that last item didn't report the drives as sleeping (to which you indicated in posting #62 that it worked), then your problem has nothing to do with your controller, your pool, or anything other than your last two drives are sleeping even though you have FreeNAS set up to disable sleeping. This could be a setting for the drives in the firmware or have you looked in the BIOS settings to see if you have a sleep setting for the drives?

Do this for me and post the results please, I want to see what the drive data is, maybe it can sort this out:

Code:
smartctl -a /dev/ada0 smartctl -a /dev/ada5

And here is the fix if you know your drives are spinning all the time and you have no intention of them sleeping:

Locate the code for chkdrive and remove the "-n standby" from the smartctl line as indicated below...

Code:
chkdrive() { smartctl -H -A -l error -l selftest ${drivepath} >> ${logfile} }

By removing this section of code (as indicated within the code notes) it will not wait for the drive to come out of sleep/standby and if the drive is actually sleeping, it will spin up.

There is no need to change your controller or uproot your drives. Let me know if the code change works.

ada0

Code:

smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 5K3000
Device Model:     Hitachi HDS5C3020ALA632
Serial Number:    ML0210F33S9JJD
LU WWN Device Id: 5 000cca 369f4ded5
Firmware Version: ML6OA800
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Tue Apr  2 13:23:40 2013 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(22966) seconds.
Offline data collection
capabilities: 			 (0x5b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 383) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   133   133   054    Pre-fail  Offline      -       106
  3 Spin_Up_Time            0x0007   212   212   024    Pre-fail  Always       -       218 (Average 299)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       88
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   146   146   020    Pre-fail  Offline      -       29
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       6621
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       87
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       256
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       256
194 Temperature_Celsius     0x0002   176   176   000    Old_age   Always       -       34 (Min/Max 22/47)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      4994         -
# 2  Short offline       Completed without error       00%      4970         -
# 3  Short offline       Completed without error       00%      4946         -
# 4  Short offline       Completed without error       00%      4922         -
# 5  Short offline       Completed without error       00%      4898         -
# 6  Short offline       Completed without error       00%      4874         -
# 7  Short offline       Completed without error       00%      4850         -
# 8  Short offline       Completed without error       00%      4826         -
# 9  Short offline       Completed without error       00%      4802         -
#10  Short offline       Completed without error       00%      4778         -
#11  Short offline       Completed without error       00%      4754         -
#12  Short offline       Completed without error       00%      4730         -
#13  Short offline       Completed without error       00%      4706         -
#14  Short offline       Completed without error       00%      4682         -
#15  Short offline       Completed without error       00%      4658         -
#16  Short offline       Completed without error       00%      4634         -
#17  Short offline       Completed without error       00%      4610         -
#18  Short offline       Completed without error       00%      4586         -
#19  Short offline       Completed without error       00%      4562         -
#20  Short offline       Completed without error       00%      4538         -
#21  Short offline       Completed without error       00%      4515         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ada5

Code:

smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format)
Device Model:     WDC WD20EARX-00PASB0
Serial Number:    WD-WMAZA8646946
LU WWN Device Id: 5 0014ee 159fd0577
Firmware Version: 51.0AB51
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Tue Apr  2 13:23:42 2013 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(38160) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 368) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3035)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       5
  3 Spin_Up_Time            0x0027   176   168   021    Pre-fail  Always       -       6200
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       89
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   092   092   000    Old_age   Always       -       6217
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       85
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       80
193 Load_Cycle_Count        0x0032   195   195   000    Old_age   Always       -       15699
194 Temperature_Celsius     0x0022   117   101   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       2
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       2

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      5238         -
# 2  Short offline       Completed without error       00%      4597         -
# 3  Short offline       Completed without error       00%      4573         -
# 4  Short offline       Completed without error       00%      4549         -
# 5  Short offline       Completed without error       00%      4525         -
# 6  Short offline       Completed without error       00%      4501         -
# 7  Short offline       Completed without error       00%      4477         -
# 8  Short offline       Completed without error       00%      4453         -
# 9  Short offline       Completed without error       00%      4429         -
#10  Short offline       Completed without error       00%      4406         -
#11  Short offline       Completed without error       00%      4382         -
#12  Short offline       Completed without error       00%      4358         -
#13  Short offline       Completed without error       00%      4334         -
#14  Short offline       Completed without error       00%      4310         -
#15  Short offline       Completed without error       00%      4286         -
#16  Short offline       Completed without error       00%      4262         -
#17  Short offline       Completed without error       00%      4238         -
#18  Short offline       Completed without error       00%      4214         -
#19  Short offline       Completed without error       00%      4191         -
#20  Short offline       Completed without error       00%      4167         -
#21  Short offline       Completed without error       00%      4143         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Torx · Apr 2, 2013

Hi there,
I found this a nice thread, thanks for the info!

I tweaked the code slightly, so it exits if the disc does not spin up within half a day it does it by itself. If smartctl fails for any reason, the script exists anyway.

Code:

#!/bin/sh

# check SMART drive status and mail results to given address

# parameters
email=<your email address>
machine=<Your machine name>

# check usage
usage()
{
   echo 'Usage: sh smartmail.sh <drive0> [drive1 [drive2...driveN]]'
   echo 'where <driveN> is the name of a device in /dev, e.g. ada0'
}

if [ $# -eq 0 ]
then
   usage
   exit 1
fi

# send summary report if more than 2 drives are involved
summarize=0
if [ $# -gt 2 ]
then
   summarize=1
fi

# specify process to check the drive
# -n standby  : skip if on standby (use 'never' to force a spinup)
# -H          : show overall health (must be included for summary!)
# -A          : show vendor-specific SMART attributes
# -l error    : show SMART error log
# -l selftest : show SMART test log
chkdrive()
{
   smartctl -n standby -H -A -l error -l selftest ${drivepath} >> ${logfile}
}

# specify process to check the drive
# -n standby  : skip if on standby (use 'never' to force a spinup)
# -H          : show overall health (must be included for summary!)
# -A          : show vendor-specific SMART attributes
# -l error    : show SMART error log
# -l selftest : show SMART test log
chkdrive_forced()
{
   smartctl -n never -H -A -l error -l selftest ${drivepath} >> ${logfile}
}

if [ ${summarize} -eq 1 ]
then
   logfile_summary=/tmp/smlog_summary
   # emit email header
   (
      echo "To: ${email}"
      echo "Subject: SMART Drive Status Summary for ${machine}"
      echo " "
      echo "SMART overall-health self-assessment test results"
   ) > ${logfile_summary}
fi

# iterate through all drives
for drive in "$@"
do
   drivepath=/dev/${drive}
   logfile=/tmp/smlog_${drive}

   # emit email header
   (
      echo "To: ${email}"
      echo "Subject: SMART Drive Status for ${machine}:${drivepath}"
      echo " "
   ) > ${logfile}

   sleepcount=0

   # check on the drive repeatedly until its awake
   chkdrive
   ret=$?
   while [ $ret != "0" ]
   do
      sleep 60
      sleepcount=`expr ${sleepcount} + 1`
      if [ ${sleepcount} -gt 720 ]
      then    
         chkdrive_forced
         ret=$?
         if [ $ret != "0" ]; then
            echo "Could not wakeup drive for SMART test!" >> ${logfile}
            ret=0
         fi
      else
         chkdrive
         ret=$?
      fi
   done

   if [ ${sleepcount} -gt 0 ]
   then
      echo " " >> ${logfile}
      echo "DRIVE WAS ASLEEP FOR ROUGHLY ${sleepcount} MINUTES BEFORE STATUS WAS AVAILABLE" >> ${logfile}
   fi

   # extract summary line if desired
   if [ ${summarize} -eq 1 ]
   then
      status=`awk '/overall/' ${logfile} | cut -d: -f2`
      echo "${drivepath}:${status}" >> ${logfile_summary}
   fi

   # remove some gratuitous lines from the file
   sed -i '' -e '/Copyright/d' ${logfile}
   sed -i '' -e '/=== START/d' ${logfile}
done

# send the summary first...
if [ ${summarize} -eq 1 ]
then
   sendmail -t < ${logfile_summary}
   rm ${logfile_summary}
fi

# ...then send individual drive status
for drive in "$@"
do
   logfile=/tmp/smlog_${drive}
   sendmail -t < ${logfile}
   rm ${logfile}
done

exit 0

Wolfeman0101 · Apr 2, 2013

joeschmuck said:

So with this updated the script works :) Thanks so much for being patient and users like you guys make FreeNAS the superior NAS solution.

SkyMonkey · Apr 5, 2013

For the record, not all drives will spin up when you request SMART data. The 2TB Red drives I am using will happily report SMART data all day long without spinning the platters up. If you copy files, or request a SMART test of some kind, then they spin up.

I've removed the -n flag entirely from my smartctl command my version of the script.

tanik1 · Apr 8, 2013

Re: Updated Code - Runs SMART test and sends results.

joeschmuck said:

UPDATE: This code runs the SMART Short test and and after 5 minutes it will email you the results. In the subject line it will say PASSED or PROBLEM.

Notes: If you change this to run the long test then you must change the wait time appropriately. For my Samsung 2TB drives the wait should be a minimum of 255 minutes according to the drive. I took the sleep timer and set it to 5 hours for a long test.

I recommend creating two versions of this, a short and long test version then you can simply call the version you want to run. Here is the short test version.

Call this script as follows: sh /etc/esmart.sh drive
example: sh /etc/esmart.sh /dev/ada0

Code:

#!/usr/local/bin/sh
#
# Place this in /conf/base/etc/
# Call: sh esmart.sh /dev/ada0
# switch1 is the drive to check (passed parameter)
switch1=$1

# This will use the characters after "/dev/" for the temp file names.
# Example: /dev/ada0 becomes coverada0 or cover0ada0 or cover1ada0
# This needs to be done to keep multiple jobs from using the same files.
drv=`echo $switch1 | cut -c6-`

# Variable just so we can add a note that the drive was asleep when the
# application started but is now awake.
c=0

### Run SMART Quick Test
runsmartshort()
{
### If changing to long SMART test, swap the hash marks from the three lines below.
### You may edit the sleep to whatever your drive recommends for the test to finish.
smartctl -t short ${switch1}
# smartctl -t long ${switch1}
echo "Short Test Running, waiting 5 minutes for test to finish."
# echo "Long Test Running, waiting 255 minutes for test to finish."
sleep 300
# sleep 15300
}

### Process to run our check on the drive, setup exclusivly for only "-l error". 
# Output cover0
chkdrive()
{
smartctl -n standby -l error -l selftest ${switch1} > /var/cover0${drv}
}

### Process to create the email header
# Input cover1, output cover.
makeheader()
{
(
echo "To: youremail@address.net"
printf "Subject: SMART Drive Results for ${switch1} - " ; cat /var/cover1${drv}
echo " "
) > /var/cover${drv}
}

### Process to create the email header for failure
# Input none, output cover.
makeheaderfailure()
{
(
echo "To: youremail@address.net"
printf "Subject: SMART Drive Results for ${switch1} - PROBLEM" 
echo " "
) > /var/cover${drv}
}

### Process for normal results
# Input is cover0, output is cover1
procnormal()
{
### Delete lines 1 through 5 leaving the status returned, cover0 cannot be changed here.
sed '1,5d' /var/cover0${drv} > /var/cover1${drv}

### If the drive was asleep we can add a line so the user knows it was sleeping
if [ $c -eq 1 ]
 then
(
echo " "
date
printf "The drive was sleeping and just woke up."
echo " "
) >> /var/cover1${drv}
fi
}

# Process to cleanup our trash files
cleanup()
{
rm /var/cover${drv}
rm /var/cover0${drv}
rm /var/cover1${drv}
}

### Lets test the drive
runsmartshort

### Lets call chkdrive, output is cover0
chkdrive
### If chkdrive returns a value 2 for sleeping then loop
while [ $? -eq "2" ]
do
### Pause the checking of the drive to about once a minute if the drive is not running.
### This can be changed to more or less frequent, it's a personal choice.
  sleep 59
  c=1
  chkdrive
done

### If chkdrive returns a value other than 0 before or after sleeping, error.
if [ $? -ne "0" ]
then
makeheaderfailure
cat /var/cover0${drv} >> /var/cover1${drv}
else
procnormal
makeheader
### Chop off all but the most recent 5 test results
sed '11,40d' /var/cover${drv} > /var/cover1${drv}
fi

sendmail -t < /var/cover1${drv}

### Call Cleanup Process
cleanup
exit 0

Like your code but is there a way to combine all the drive in one email? So have it just send the over all email and the rest of the drive info in one big email? Sorry not that great with bash not sure how to change it to fit my needs. I was able to throw getting temps in there but the rest is a little difficult to figure out.

tanik1 · Apr 9, 2013

Never mind. I was able to figure it out. This is how it is I think someone can look into and probably make it better. But I did it if you have a highpoint controller. and I combine joeschmuck and bertrem code together and changed it a bit. Thanks to them I was able to get it working this way. It will send 2 emails one is a summary for the drives and temperature and the other the logs of the drives.

Here how minds look like. you have to do an sh extrasmart.sh 1 2 3 4 (the numbers are for hpt,1/1, hpt,1/2, etc)

Code:

#!/bin/sh

# check SMART drive status and mail results to given address

# parameters
email=youremail@email.com
machine=yourmachinename

### Run SMART Quick Test
runsmartshort()
{
### If changing to long SMART test, swap the hash marks from the three lines below.
### You may edit the sleep to whatever your drive recommends for the test to finish.
smartctl -t short -d ${drivepath} /dev/hpt27xx
# smartctl -t long ${switch1}
echo "Short Test Running, waiting 2 minutes for test to finish."
# echo "Long Test Running, waiting 255 minutes for test to finish."
sleep 150
# sleep 15300
}

# Make email header
makeheader()
{
   (  echo "To: ${email}"
      echo "Subject: SMART Drive Status Report"
      echo " "
   ) > /var/drivelogfile
}

#remove drivelogfile
cleanup()
{
	rm /var/drivelogfile
        rm /var/smlog_${drive}
}

# check usage
usage()
{
   echo 'Usage: sh smartmail.sh <drive0> [drive1 [drive2...driveN]]'
   echo 'where <driveN> is the name of a device in /dev, e.g. ada0'
}

if [ $# -eq 0 ]
then
   usage
   exit 1
fi

# send summary report if more than 2 drives are involved
summarize=0
if [ $# -gt 1 ]
then
   summarize=1
fi

# specify process to check the drive
# -n standby  : skip if on standby (use 'never' to force a spinup)
# -H          : show overall health (must be included for summary!)
# -A          : show vendor-specific SMART attributes
# -l error    : show SMART error log
# -l selftest : show SMART test log
chkdrive()
{
   smartctl -n standby -H -A -l error -l selftest -d ${drivepath} /dev/hpt27xx >> ${logfile}
}

if [ ${summarize} -eq 1 ]
then
   logfile_summary=/var/smlog_summary
   # emit email header
   (
      echo "To: ${email}"
      echo "Subject: SMART Drive Status Summary for ${machine}"
      echo " "
      echo "SMART overall-health self-assessment test results"
   ) > ${logfile_summary}
fi

#where email header for drive starts
makeheader

# iterate through all drives
for drive in "$@"
do
   drivepath=hpt,1/${drive}
   logfile=/var/smlog_${drive}
   
   runsmartshort
   
   sleepcount=0

   # check on the drive repeatedly until its awake
   chkdrive
   while [ $? != "0" ]
   do
      sleep 60
      sleepcount=`expr ${sleepcount} + 1`
      chkdrive
   done

   if [ ${sleepcount} -gt 0 ]
   then
      echo " "
      echo "DRIVE WAS ASLEEP FOR ROUGHLY ${sleepcount} MINUTES BEFORE STATUS WAS AVAILABLE" >> ${logfile}
   fi

   # extract summary line if desired
   if [ ${summarize} -eq 1 ]
   then
      status=`awk '/overall/' ${logfile} | cut -d: -f2`
      temp=`smartctl -a -d ${drivepath} /dev/hpt27xx | grep "194 Temperature" | cut -c5-24,37-40`
      echo "${drivepath}:${status}:${temp}" >> ${logfile_summary}
   fi
   
    #  remove some gratuitous lines from the file
   sed -i '' -e '/Copyright/d' ${logfile}
   sed -i '' -e '/=== START/d' ${logfile}
done
echo "Current CPU Temperatures============" >> ${logfile_summary}
sysctl -a |egrep -E "cpu\.[0-9]+\.temp" >> ${logfile_summary} >> ${logfile_summary}

# send the summary first...
if [ ${summarize} -eq 1 ]
then
   sendmail -t < ${logfile_summary}
   rm ${logfile_summary}
fi

# ...then send individual drive status
for drive in "$@"
do
   logfile=/var/smlog_${drive}
   echo "=======This is drive ${drive} smart summary======" >> /var/drivelogfile
   cat ${logfile} >> /var/drivelogfile
done

sendmail -t < /var/drivelogfile
cleanup
exit 0

BillyBob2 · Apr 30, 2013

cyberjock said:

I figured I'd post my version since it is different. I am using an Areca RAID controller so I have to get my SMART information from the areca-cli utility included with FreeNAS. Credit to joeschmuck for providing his script from which my script is based on.

I execute a modified esmart.sh(tweaked slightly to make me happy) and call the areca-cli and feed it the necessary commands to get data from all 24 ports. If a port is empty it will reply with an error but will continue to the next port. That way I don't have to keep changing the script if I add/remove/move drives. I also keep my data on the zpool just in case I want it someday.

I have 2 major 'feature' additions for me though:
1. I create a file with the SMART data for all of the drives and then grep Pending to get a printout of just the lines with Current Pending Sector count. Now when I get the email instead of going through 47kbytes of email I can look at the first 28 lines(4 drives on my onboard Intel controller plus 24 from Areca). If they are all zeros then I know things are doing good. Current Pending Sector count is not an end-all-be-all for failing disks, but it is a very good indicator. I had to do this because the areca-cli does not allow for a printout of just the error log for the hard drives.
2. I also grep Temperature to get a printout of all of the drive temps. The areca-cli returns the temps in Fahrenheit even though it says C. I just have to ignore the C and realize that the drives aren't 115C but 115F. :)

Code:

rm /mnt/tank/.SMARTdata/`date +%Y%m%d`
rm /mnt/tank/.SMARTdata/`date +%Y%m%d`b

(
echo "To: ***youremail@goeshere.com***"
echo "Subject: SMART Drive Results for hard drives in ***yourservernamehere***"
echo " "
echo "The following lists the Current Pending Sector Count for all hard drives on the system in order:"
echo " "
) > /mnt/tank/.SMARTdata/`date +%Y%m%d`

smartctl -a /dev/ada0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a /dev/ada1 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a /dev/ada2 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a /dev/ada3 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
areca-cli < /mnt/tank/.SMARTdata/areca >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b

cat /mnt/tank/.SMARTdata/`date +%Y%m%d`b | grep Pending >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo " " >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo "The following lists the current temperatures for all hard drives on the system in order:" >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo " " >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
cat /mnt/tank/.SMARTdata/`date +%Y%m%d`b | grep "194 Temperature" >> /mnt/tank/.SMARTdata/`date +%Y%m%d`

echo " " >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo "The following is the long printout of all SMART data for all hard drives on the system in order:" >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo " " >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
cat /mnt/tank/.SMARTdata/`date +%Y%m%d`b >> /mnt/tank/.SMARTdata/`date +%Y%m%d`

rm /mnt/tank/.SMARTdata/`date +%Y%m%d`b
sendmail -t < /mnt/tank/.SMARTdata/`date +%Y%m%d`
exit 0

My /mnt/tank/.SMARTdata/areca (the commands that are run inside the areca-cli) are:

Code:

set password=***yourRAIDcontrollerpasswordhere***
disk info drv=1
disk smart drv=1
disk info drv=2
disk smart drv=2
disk info drv=3
disk smart drv=3
disk info drv=4
disk smart drv=4
disk info drv=5
disk smart drv=5
disk info drv=6
disk smart drv=6
disk info drv=7
disk smart drv=7
disk info drv=8
disk smart drv=8
disk info drv=9
disk smart drv=9
disk info drv=10
disk smart drv=10
disk info drv=11
disk smart drv=11
disk info drv=12
disk smart drv=12
disk info drv=13
disk smart drv=13
disk info drv=14
disk smart drv=14
disk info drv=15
disk smart drv=15
disk info drv=16
disk smart drv=16
disk info drv=17
disk smart drv=17
disk info drv=18
disk smart drv=18
disk info drv=19
disk smart drv=19
disk info drv=20
disk smart drv=20
disk info drv=21
disk smart drv=21
disk info drv=22
disk smart drv=22
disk info drv=23
disk smart drv=23
disk info drv=24
disk smart drv=24
hw info
exit

The email I get looks like this:

Code:

The following lists the Current Pending Sector Count for all hard drives on the system in order:
 
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
197 Current Pending Sector Count              0x32     200      0  OK          
 
The following lists the current temperatures for all hard drives on the system in order:
 
194 Temperature_Celsius     0x0022   113   103   000    Old_age   Always       -       39
194 Temperature_Celsius     0x0022   112   102   000    Old_age   Always       -       40
194 Temperature_Celsius     0x0022   114   102   000    Old_age   Always       -       38
194 Temperature_Celsius     0x0022   113   101   000    Old_age   Always       -       39
194 Temperature                               0x22     111      0  OK          
194 Temperature                               0x22     110      0  OK          
194 Temperature                               0x22     111      0  OK          
194 Temperature                               0x22     112      0  OK          
194 Temperature                               0x22     111      0  OK          
194 Temperature                               0x22     111      0  OK          
194 Temperature                               0x22     110      0  OK          
194 Temperature                               0x22     112      0  OK          
194 Temperature                               0x22     113      0  OK          
194 Temperature                               0x22     111      0  OK          
194 Temperature                               0x22     113      0  OK          
194 Temperature                               0x22     115      0  OK          
194 Temperature                               0x22     115      0  OK          
194 Temperature                               0x22     115      0  OK          
 
The following is the long printout of all SMART data for all hard drives on the system in order:

Then a complete printout of all of the drive SMART info is attached so I can match the bad temp/pending sector count to the /dev and serial number.

Hello Cyberjack, and thank your SMART Script. Looks like what I would think should be the default for FreeNAS.

Being the ignominious that I am, I am having issues making it run.

I don't have any extra controllers as you do. I am using my motherboard's integrated controller, So I don't need that addition to your script.

how should I modify it in order to make your script work for me?
Do I need to create some additional directories?
setup certain permissions?

and thank you again for your help, its very much appreciated.

cyberjock · Apr 30, 2013

Name is Cyberjock, not Cyberjack. :P

Your best bet is to Google the commands and alter them to fit your configuration. Your /dev/***, file locations, etc. don't match mine, so you will need to customize it for your configuration. Then, if you add more drives(or take drives away) you'll need to alter the script accordingly.

cyberjock · May 9, 2013

So if you read my thread regarding an issue I found with the areca-cli and it not being completely truthful about the SMART data it obtains you'd recognize my script is no longer as good as I had intended. So I bring you.. v2.0. I have removed all references to the areca-cli since its garbage. If you want to save the areca controller config(for instance you want write and read cache settings to be displayed) you'll have to fix it up yourself. I consider the data to be unnecessary.

Note that I have 4 drives on my Intel SATA controller so I included ada0 through ada3. Also, I chose to query every port on my Areca so that if I add or remove drives I don't need to mess with my script at all.

Code:

rm /mnt/tank/.SMARTdata/`date +%Y%m%d`
rm /mnt/tank/.SMARTdata/`date +%Y%m%d`b

smartctl -a /dev/ada0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a /dev/ada1 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a /dev/ada2 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a /dev/ada3 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,1 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,2 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,3 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,4 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,5 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,6 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,7 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,8 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,9 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,10 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,11 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,12 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,13 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,14 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,15 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,16 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,17 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,18 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,19 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,20 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,21 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,22 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,23 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b
smartctl -a --device=areca,24 /dev/arcmsr0 >> /mnt/tank/.SMARTdata/`date +%Y%m%d`b

(
echo "To: youremailaddresshere"
echo "Subject: SMART Drive Results for hard drives in yourservernamehere"
echo " "
echo "The following lists the Current Pending Sector Count for all hard drives on the system in order:"
echo " "
) > /mnt/tank/.SMARTdata/`date +%Y%m%d`

cat /mnt/tank/.SMARTdata/`date +%Y%m%d`b | grep Pending >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo " " >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo "The following lists the current temperatures for all hard drives on the system in order:" >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo " " >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
cat /mnt/tank/.SMARTdata/`date +%Y%m%d`b | grep "194 Temperature" >> /mnt/tank/.SMARTdata/`date +%Y%m%d`

echo " " >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo "The following is the long printout of all SMART data for all hard drives on the system in order:" >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
echo " " >> /mnt/tank/.SMARTdata/`date +%Y%m%d`
cat /mnt/tank/.SMARTdata/`date +%Y%m%d`b >> /mnt/tank/.SMARTdata/`date +%Y%m%d`

rm /mnt/tank/.SMARTdata/`date +%Y%m%d`b
sendmail -t < /mnt/tank/.SMARTdata/`date +%Y%m%d`
exit 0

cyberjock · Jun 13, 2013

Somewhat related topic since I had some failed drives recently... the script I'm using is above this post. The snippet of my email looks like this:

Code:

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0

Is there a way to have the script tell me how many lines of CPS I have? I always count the number of drives to make sure they are all there. Since I'm expecting 24 drives I figure a good check is to make sure I have 24 values for Current Pending Sector count. If I don't then I know that some drives aren't functioning in the system.

joeschmuck · Jun 13, 2013

Sure you can count the lines and actually list it at the top of the message with something like "24 Drives Reported". You can either test for the 24 lines of text or better yet, error check each smartctl response and then if you get a failure you can add that to the top of the text message like "23 Drives Responded, areca 15 failed to respond.".

Items I would do:
1) Make each run of smartctl check the errorlevel return value to see if it returns other than zero (0).

Code:

if [ $? -ne "0" ]

2) If not zero then record the drive info into a temp text file, like "areca 15".
3) Setup a counter and after each smartctl increment the counter on a pass (zero) errorlevel.
4) At the end of checking all your drives, if your counter equals the number of drives you have then you can just enter a default header in your test "24 Drives Reported". If the counter is lower then you can subtract the count from 24 and add the drive(s), "22 Drives Reported, 2 Drives failed to report (areca 15, areca 17)".
5) And of course append the rest of the data to that header.

Easy.

EDIT: Since this is very hard coded for each specific user it may not make any difference but I would myself set variables of MaxDrives=24. In your setup because you have so many drives with the same commands, create a loop which would significantly reduce your text. Things like this make it more usable to other folks in the same situation, although I wouldn't make a huge change at once. Get the basics to work first and then reduce the code in favor of a few loops.

Edit2: I would change the Subject line to "ERROR --- (your text)" if there was a failure as well to draw your eye to it.

Important Announcement for the TrueNAS Community.

Set up SMART Reporting via email

Old Man

Patron

Dabbler

Old Man

Dabbler

Patron

Old Man

Patron

Old Man

Patron

Cadet

Patron

Contributor

Contributor

Contributor

Dabbler

Inactive Account

Inactive Account

Inactive Account

Old Man

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Set up SMART Reporting via email"

Similar threads