Critical Alerts: 8 Offline uncorrectable sectors

Status
Not open for further replies.

MichaelBatz

Dabbler
Joined
Apr 30, 2015
Messages
26
Hey everybody!

I received this mail.

HTML:
Device: /dev/ada1, 8 Currently unreadable (pending) sectors
Device: /dev/ada1, 8 Offline uncorrectable sectors

The first thing I did was to create a RMA for the HDD, because the warranty expires in less than a month. The RMA got accepted within 10 min.

I did some more testing. Now I am a little in doubt on how to proceed.

I did an extended offline test, and i scrubbed the pool.

Code:
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1CH166
Serial Number:    Z1F4BT5K
LU WWN Device Id: 5 000c50 065b0c5b8
Firmware Version: CC27
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jun 15 10:16:36 2016 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  584) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 329) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   106   099   006    Pre-fail  Always       -       11595224
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   096   096   020    Old_age   Always       -       4759
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail  Always       -       372164430
  9 Power_On_Hours          0x0032   084   084   000    Old_age   Always       -       14516
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       41
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 1
189 High_Fly_Writes         0x003a   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   062   037   045    Old_age   Always   In_the_past 38 (Min/Max 23/43 #2692)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       34
193 Load_Cycle_Count        0x0032   041   041   000    Old_age   Always       -       119380
194 Temperature_Celsius     0x0022   038   063   000    Old_age   Always       -       38 (0 20 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       12551h+37m+17.098s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       46003325539
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       197852327815

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     14440         -
# 2  Extended offline    Completed without error       00%     14350         -
# 3  Short offline       Completed without error       00%     14272         -
# 4  Short offline       Completed without error       00%     14118         -
# 5  Extended offline    Completed without error       00%     14105         -
# 6  Extended offline    Completed: read failure       20%     14028         -
# 7  Short offline       Completed without error       00%     13950         -
# 8  Short offline       Completed without error       00%     13782         -
# 9  Extended offline    Completed without error       00%     13693         -
#10  Short offline       Completed without error       00%     13614         -
#11  Short offline       Completed without error       00%     13398         -
#12  Extended offline    Completed without error       00%     13309         -
#13  Short offline       Completed without error       00%     13230         -
#14  Short offline       Completed without error       00%     13062         -
#15  Extended offline    Completed without error       00%     12972         -
#16  Short offline       Completed without error       00%     12894         -
#17  Short offline       Completed without error       00%     12656         -
#18  Extended offline    Completed without error       00%     12565         -
#19  Short offline       Completed without error       00%     12488         -
#20  Short offline       Completed without error       00%     12319         -
#21  Extended offline    Completed without error       00%     12230         -
1 of 1 failed self-tests are outdated by newer successful extended offline self-test # 2

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Code:
  pool: tank
state: ONLINE
  scan: scrub repaired 0 in 45h56m with 0 errors on Thu Jun  2 22:56:56 2016
config:

        NAME                                            STATE     READ WRITE CKSUM
        tank                                            ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/7d0d4614-1d70-11e5-9640-0cc47a31d318  ONLINE       0     0     0
            gptid/7d9d0f81-1d70-11e5-9640-0cc47a31d318  ONLINE       0     0     0
            gptid/7e1863a0-1d70-11e5-9640-0cc47a31d318  ONLINE       0     0     0
            gptid/7e8c4e54-1d70-11e5-9640-0cc47a31d318  ONLINE       0     0     0
            gptid/7f03f0cd-1d70-11e5-9640-0cc47a31d318  ONLINE       0     0     0
            gptid/7fa080f2-1d70-11e5-9640-0cc47a31d318  ONLINE       0     0     0
            gptid/8018dd86-1d70-11e5-9640-0cc47a31d318  ONLINE       0     0     0
            gptid/80ac89ef-1d70-11e5-9640-0cc47a31d318  ONLINE       0     0     0
            gptid/811ff591-1d70-11e5-9640-0cc47a31d318  ONLINE       0     0     0
            gptid/8180504c-1d70-11e5-9640-0cc47a31d318  ONLINE       0     0     0

errors: No known data errors



The values for both Current_Pending_Sector and Offline_Uncorrectable are now 0, and the extended offline test have completed without any errors.

If I should RMA this HDD, then they would probably just return it to me, and charge for a lot of money. Not interested in that. How to proceed? I still have a great suspicions that this HDD is slowly dying. Or I'm just being too paranoid?
 

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
If it were me, and with only 1 month left on the warranty I'd still RMA it. I've sent 3 drives back to Seagate this year with similar issues and all of them have been replaced without any problems.

If you had a longer warranty period it might be worth just keeping an eye on it, but if it starts reporting more in 2-3 months time you won't have that option.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
The thing is pending sectors aren't bad sectors but probably bad sectors. When you did the long test it probably detected those pending sectors as fine and put the counter back to 0.

I don't know if it'll qualify for the RMA though...
 

MichaelBatz

Dabbler
Joined
Apr 30, 2015
Messages
26
The thing is pending sectors aren't bad sectors but probably bad sectors. When you did the long test it probably detected those pending sectors as fine and put the counter back to 0.

I don't know if it'll qualify for the RMA though...

I'm more worried about the 8 Offline uncorrectable sectors. I don't understand how it could decrease to 0. Uncorrectable is uncorrectable. Maybe there is some reserved sectors?

I'm using your status script report for monitoring. Ever since the Critical Alert mail, I have been getting this mail once in awhile, not every time.


Code:
Cron <root@freenas> PATH="/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/root/bin" /root/scripts/smart_report.sh > /dev/null

awk: can't open file \
        /Serial Number:/{serial=$3} \
        /Temperature_Celsius/{temp=$10} \
        /Power_On_Hours/{onHours=$10} \
        /Start_Stop_Count/{startStop=$10} \
        /Spin_Retry_Count/{spinRetry=$10} \
        /Reallocated_Sector/{reAlloc=$10} \
        /Current_Pending_Sector/{pending=$10} \
        /Offline_Uncorrectable/{offlineUnc=$10} \
        /UDMA_CRC_Error_Count/{crcErrors=$10} \
        /Seek_Error_Rate/{seekErrors=("0x" substr($10,3,4));totalSeeks=("0x" substr($10,7))} \
        /High_Fly_Writes/{hiFlyWr=$10} \
        /Command_Timeout/{cmdTimeout=$10} \
        END {
            if (temp > tempCrit || reAlloc > sectorsCrit || pending > sectorsCrit || offlineUnc > sectorsCrit)
                device=device " " critSymbol;
            else if (temp > tempWarn || reAlloc > 0 || pending > 0 || offlineUnc > 0)
                device=device " " warnSymbol;
            seekErrors=sprintf("%d", seekErrors);
            totalSeeks=sprintf("%d", totalSeeks);
            if (totalSeeks == "0") {
                seekErrors="N/A";
                totalSeeks="N/A";
            }
            if (hiFlyWr == "") hiFlyWr="N/A";
            if (cmdTimeout == "") cmdTimeout="N/A";
            testAge=sprintf("%.0f", (onHours - lastTestHours) / 24);
            printf "|%-6s|%-15s| %s |%5s|%5s|%5s|%7s|%7s|%8s|%6s|%6s|%6s|%7s|%4s|\n",
            device, serial, temp, onHours, startStop, spinRetry, reAlloc, pending, offlineUnc, \
            crcErrors, seekErrors, hiFlyWr, cmdTimeout, testAge;
        }
source line number 1
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
I'm more worried about the 8 Offline uncorrectable sectors. I don't understand how it could decrease to 0. Uncorrectable is uncorrectable. Maybe there is some reserved sectors?

I guess it's an error in the parser used in FreeNAS to generate the alert in the GUI and the drive never had offline uncorrectable sectors.

I'm using your status script report for monitoring. Ever since the Critical Alert mail, I have been getting this mail once in awhile, not every time.

Looks like bad characters in the file; can you post the output of cat -e /root/scripts/smart_report.sh please?
 

MichaelBatz

Dabbler
Joined
Apr 30, 2015
Messages
26
I guess it's an error in the parser used in FreeNAS to generate the alert in the GUI and the drive never had offline uncorrectable sectors.

Looks like bad characters in the file; can you post the output of cat -e /root/scripts/smart_report.sh please?

When I created the RMA. I took a screenshot of the smartctl -a /dev/ada1 output.

https://postimg.org/image/h0bhtddi3/

I guess i cannot be a parser error in FreeNAS GUI.


Yeah, sure.

Code:
#!/bin/sh$
$
### Parameters ###$
logfile="/tmp/smart_report.tmp"$
email="batz191289@gmail.com"$
subject="SMART Status Report for FreeNAS"$
drives="da0 da1 da2 da3 da4 da5 da6 da7 ada0 ada1"$
tempWarn=40$
tempCrit=45$
sectorsCrit=10$
warnSymbol="?"$
critSymbol="!"$
$
### Set email headers ###$
($
    echo "To: ${email}"$
    echo "Subject: ${subject}"$
    echo "Content-Type: text/html"$
    echo "MIME-Version: 1.0"$
    echo -e "\r\n"$
) > ${logfile}$
$
### Set email body ###$
echo "<pre style=\"font-size:14px\">" >> ${logfile}$
$
###### summary ######$
($
    echo ""$
    echo "########## SMART status report summary for all drives ##########"$
    echo ""$
    echo "+------+---------------+----+-----+-----+-----+-------+-------+-------                                                                                                                                                             -+------+------+------+-------+----+"$
    echo "|Device|Serial         |Temp|Power|Start|Spin |ReAlloc|Current|Offline                                                                                                                                                              |UDMA  |Seek  |High  |Command|Last|"$
    echo "|      |               |    |On   |Stop |Retry|Sectors|Pending|Uncorre                                                                                                                                                             c|CRC   |Errors|Fly   |Timeout|Test|"$
    echo "|      |               |    |Hours|Count|Count|       |Sectors|Sectors                                                                                                                                                              |Errors|      |Writes|Count  |Age |"$
    echo "+------+---------------+----+-----+-----+-----+-------+-------+-------                                                                                                                                                             -+------+------+------+-------+----+"$
) >> ${logfile}$
for drive in $drives$
do$
    ($
        smartctl -A -i -v 7,hex48 /dev/${drive} | \$
        awk -v device=${drive} -v tempWarn=${tempWarn} -v tempCrit=${tempCrit} -                                                                                                                                                             v sectorsCrit=${sectorsCrit} \$
        -v warnSymbol=${warnSymbol} -v critSymbol=${critSymbol} \$
        -v lastTestHours=`smartctl -l selftest /dev/${drive} | grep "# 1" | awk                                                                                                                                                              '{print $9}'` '\$
        /Serial Number:/{serial=$3} \$
        /Temperature_Celsius/{temp=$10} \$
        /Power_On_Hours/{onHours=$10} \$
        /Start_Stop_Count/{startStop=$10} \$
        /Spin_Retry_Count/{spinRetry=$10} \$
        /Reallocated_Sector/{reAlloc=$10} \$
        /Current_Pending_Sector/{pending=$10} \$
        /Offline_Uncorrectable/{offlineUnc=$10} \$
        /UDMA_CRC_Error_Count/{crcErrors=$10} \$
        /Seek_Error_Rate/{seekErrors=("0x" substr($10,3,4));totalSeeks=("0x" sub                                                                                                                                                             str($10,7))} \$
        /High_Fly_Writes/{hiFlyWr=$10} \$
        /Command_Timeout/{cmdTimeout=$10} \$
        END {$
            if (temp > tempCrit || reAlloc > sectorsCrit || pending > sectorsCri                                                                                                                                                             t || offlineUnc > sectorsCrit)$
                device=device " " critSymbol;$
            else if (temp > tempWarn || reAlloc > 0 || pending > 0 || offlineUnc                                                                                                                                                              > 0)$
                device=device " " warnSymbol;$
            seekErrors=sprintf("%d", seekErrors);$
            totalSeeks=sprintf("%d", totalSeeks);$
            if (totalSeeks == "0") {$
                seekErrors="N/A";$
                totalSeeks="N/A";$
            }$
            if (hiFlyWr == "") hiFlyWr="N/A";$
            if (cmdTimeout == "") cmdTimeout="N/A";$
            testAge=sprintf("%.0f", (onHours - lastTestHours) / 24);$
            printf "|%-6s|%-15s| %s |%5s|%5s|%5s|%7s|%7s|%8s|%6s|%6s|%6s|%7s|%4s                                                                                                                                                             |\n",$
            device, serial, temp, onHours, startStop, spinRetry, reAlloc, pendin                                                                                                                                                             g, offlineUnc, \$
            crcErrors, seekErrors, hiFlyWr, cmdTimeout, testAge;$
        }'$
    ) >> ${logfile}$
done$
($
    echo "+------+---------------+----+-----+-----+-----+-------+-------+-------                                                                                                                                                             -+------+------+------+-------+----+"$
    echo ""$
    echo ""$
) >> ${logfile}$
$
###### for each drive ######$
for drive in $drives$
do$
    brand=`smartctl -i /dev/${drive} | grep "Model Family" | awk '{print $3, $4,                                                                                                                                                              $5}'`$
    serial=`smartctl -i /dev/${drive} | grep "Serial Number" | awk '{print $3}'`                                                                                                                                                             $
    ($
        echo ""$
        echo "########## SMART status report for ${drive} drive (${brand}: ${ser                                                                                                                                                             ial}) ##########"$
        smartctl -H -A -l error /dev/${drive}$
        smartctl -l selftest /dev/${drive} | grep "# 1 \|Num" | cut -c6-$
        echo ""$
        echo ""$
    ) >> ${logfile}$
done$
sed -i '' -e '/smartctl 6.3/d' ${logfile}$
sed -i '' -e '/Copyright/d' ${logfile}$
sed -i '' -e '/=== START OF READ/d' ${logfile}$
sed -i '' -e '/SMART Attributes Data/d' ${logfile}$
sed -i '' -e '/Vendor Specific SMART/d' ${logfile}$
sed -i '' -e '/SMART Error Log Version/d' ${logfile}$
echo "</pre>" >> ${logfile}$
$
### Send report ###$
sendmail -t < ${logfile}$
rm ${logfile}$
freenas# cat -e /root/scripts/smart_report.sh
#!/bin/sh$
$
### Parameters ###$
logfile="/tmp/smart_report.tmp"$
email="batz191289@gmail.com"$
subject="SMART Status Report for FreeNAS"$
drives="da0 da1 da2 da3 da4 da5 da6 da7 ada0 ada1"$
tempWarn=40$
tempCrit=45$
sectorsCrit=10$
warnSymbol="?"$
critSymbol="!"$
$
### Set email headers ###$
($
    echo "To: ${email}"$
    echo "Subject: ${subject}"$
    echo "Content-Type: text/html"$
    echo "MIME-Version: 1.0"$
    echo -e "\r\n"$
) > ${logfile}$
$
### Set email body ###$
echo "<pre style=\"font-size:14px\">" >> ${logfile}$
$
###### summary ######$
($
    echo ""$
    echo "########## SMART status report summary for all drives ##########"$
    echo ""$
    echo "+------+---------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+"$
    echo "|Device|Serial         |Temp|Power|Start|Spin |ReAlloc|Current|Offline |UDMA  |Seek  |High  |Command|Last|"$
    echo "|      |               |    |On   |Stop |Retry|Sectors|Pending|Uncorrec|CRC   |Errors|Fly   |Timeout|Test|"$
    echo "|      |               |    |Hours|Count|Count|       |Sectors|Sectors |Errors|      |Writes|Count  |Age |"$
    echo "+------+---------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+"$
) >> ${logfile}$
for drive in $drives$
do$
    ($
        smartctl -A -i -v 7,hex48 /dev/${drive} | \$
        awk -v device=${drive} -v tempWarn=${tempWarn} -v tempCrit=${tempCrit} -v sectorsCrit=${sectorsCrit} \$
        -v warnSymbol=${warnSymbol} -v critSymbol=${critSymbol} \$
        -v lastTestHours=`smartctl -l selftest /dev/${drive} | grep "# 1" | awk '{print $9}'` '\$
        /Serial Number:/{serial=$3} \$
        /Temperature_Celsius/{temp=$10} \$
        /Power_On_Hours/{onHours=$10} \$
        /Start_Stop_Count/{startStop=$10} \$
        /Spin_Retry_Count/{spinRetry=$10} \$
        /Reallocated_Sector/{reAlloc=$10} \$
        /Current_Pending_Sector/{pending=$10} \$
        /Offline_Uncorrectable/{offlineUnc=$10} \$
        /UDMA_CRC_Error_Count/{crcErrors=$10} \$
        /Seek_Error_Rate/{seekErrors=("0x" substr($10,3,4));totalSeeks=("0x" substr($10,7))} \$
        /High_Fly_Writes/{hiFlyWr=$10} \$
        /Command_Timeout/{cmdTimeout=$10} \$
        END {$
            if (temp > tempCrit || reAlloc > sectorsCrit || pending > sectorsCrit || offlineUnc > sectorsCrit)$
                device=device " " critSymbol;$
            else if (temp > tempWarn || reAlloc > 0 || pending > 0 || offlineUnc > 0)$
                device=device " " warnSymbol;$
            seekErrors=sprintf("%d", seekErrors);$
            totalSeeks=sprintf("%d", totalSeeks);$
            if (totalSeeks == "0") {$
                seekErrors="N/A";$
                totalSeeks="N/A";$
            }$
            if (hiFlyWr == "") hiFlyWr="N/A";$
            if (cmdTimeout == "") cmdTimeout="N/A";$
            testAge=sprintf("%.0f", (onHours - lastTestHours) / 24);$
            printf "|%-6s|%-15s| %s |%5s|%5s|%5s|%7s|%7s|%8s|%6s|%6s|%6s|%7s|%4s|\n",$
            device, serial, temp, onHours, startStop, spinRetry, reAlloc, pending, offlineUnc, \$
            crcErrors, seekErrors, hiFlyWr, cmdTimeout, testAge;$
        }'$
    ) >> ${logfile}$
done$
($
    echo "+------+---------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+"$
    echo ""$
    echo ""$
) >> ${logfile}$
$
###### for each drive ######$
for drive in $drives$
do$
    brand=`smartctl -i /dev/${drive} | grep "Model Family" | awk '{print $3, $4, $5}'`$
    serial=`smartctl -i /dev/${drive} | grep "Serial Number" | awk '{print $3}'`$
    ($
        echo ""$
        echo "########## SMART status report for ${drive} drive (${brand}: ${serial}) ##########"$
        smartctl -H -A -l error /dev/${drive}$
        smartctl -l selftest /dev/${drive} | grep "# 1 \|Num" | cut -c6-$
        echo ""$
        echo ""$
    ) >> ${logfile}$
done$
sed -i '' -e '/smartctl 6.3/d' ${logfile}$
sed -i '' -e '/Copyright/d' ${logfile}$
sed -i '' -e '/=== START OF READ/d' ${logfile}$
sed -i '' -e '/SMART Attributes Data/d' ${logfile}$
sed -i '' -e '/Vendor Specific SMART/d' ${logfile}$
sed -i '' -e '/SMART Error Log Version/d' ${logfile}$
echo "</pre>" >> ${logfile}$
$
### Send report ###$
sendmail -t < ${logfile}$
rm ${logfile}$
freenas# clear
freenas# clear
freenas# cat -e /root/scripts/smart_report.sh
#!/bin/sh$
$
### Parameters ###$
logfile="/tmp/smart_report.tmp"$
email="batz191289@gmail.com"$
subject="SMART Status Report for FreeNAS"$
drives="da0 da1 da2 da3 da4 da5 da6 da7 ada0 ada1"$
tempWarn=40$
tempCrit=45$
sectorsCrit=10$
warnSymbol="?"$
critSymbol="!"$
$
### Set email headers ###$
($
    echo "To: ${email}"$
    echo "Subject: ${subject}"$
    echo "Content-Type: text/html"$
    echo "MIME-Version: 1.0"$
    echo -e "\r\n"$
) > ${logfile}$
$
### Set email body ###$
echo "<pre style=\"font-size:14px\">" >> ${logfile}$
$
###### summary ######$
($
    echo ""$
    echo "########## SMART status report summary for all drives ##########"$
    echo ""$
    echo "+------+---------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+"$
    echo "|Device|Serial         |Temp|Power|Start|Spin |ReAlloc|Current|Offline |UDMA  |Seek  |High  |Command|Last|"$
    echo "|      |               |    |On   |Stop |Retry|Sectors|Pending|Uncorrec|CRC   |Errors|Fly   |Timeout|Test|"$
    echo "|      |               |    |Hours|Count|Count|       |Sectors|Sectors |Errors|      |Writes|Count  |Age |"$
    echo "+------+---------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+"$
) >> ${logfile}$
for drive in $drives$
do$
    ($
        smartctl -A -i -v 7,hex48 /dev/${drive} | \$
        awk -v device=${drive} -v tempWarn=${tempWarn} -v tempCrit=${tempCrit} -v sectorsCrit=${sectorsCrit} \$
        -v warnSymbol=${warnSymbol} -v critSymbol=${critSymbol} \$
        -v lastTestHours=`smartctl -l selftest /dev/${drive} | grep "# 1" | awk '{print $9}'` '\$
        /Serial Number:/{serial=$3} \$
        /Temperature_Celsius/{temp=$10} \$
        /Power_On_Hours/{onHours=$10} \$
        /Start_Stop_Count/{startStop=$10} \$
        /Spin_Retry_Count/{spinRetry=$10} \$
        /Reallocated_Sector/{reAlloc=$10} \$
        /Current_Pending_Sector/{pending=$10} \$
        /Offline_Uncorrectable/{offlineUnc=$10} \$
        /UDMA_CRC_Error_Count/{crcErrors=$10} \$
        /Seek_Error_Rate/{seekErrors=("0x" substr($10,3,4));totalSeeks=("0x" substr($10,7))} \$
        /High_Fly_Writes/{hiFlyWr=$10} \$
        /Command_Timeout/{cmdTimeout=$10} \$
        END {$
            if (temp > tempCrit || reAlloc > sectorsCrit || pending > sectorsCrit || offlineUnc > sectorsCrit)$
                device=device " " critSymbol;$
            else if (temp > tempWarn || reAlloc > 0 || pending > 0 || offlineUnc > 0)$
                device=device " " warnSymbol;$
            seekErrors=sprintf("%d", seekErrors);$
            totalSeeks=sprintf("%d", totalSeeks);$
            if (totalSeeks == "0") {$
                seekErrors="N/A";$
                totalSeeks="N/A";$
            }$
            if (hiFlyWr == "") hiFlyWr="N/A";$
            if (cmdTimeout == "") cmdTimeout="N/A";$
            testAge=sprintf("%.0f", (onHours - lastTestHours) / 24);$
            printf "|%-6s|%-15s| %s |%5s|%5s|%5s|%7s|%7s|%8s|%6s|%6s|%6s|%7s|%4s|\n",$
            device, serial, temp, onHours, startStop, spinRetry, reAlloc, pending, offlineUnc, \$
            crcErrors, seekErrors, hiFlyWr, cmdTimeout, testAge;$
        }'$
    ) >> ${logfile}$
done$
($
    echo "+------+---------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+"$
    echo ""$
    echo ""$
) >> ${logfile}$
$
###### for each drive ######$
for drive in $drives$
do$
    brand=`smartctl -i /dev/${drive} | grep "Model Family" | awk '{print $3, $4, $5}'`$
    serial=`smartctl -i /dev/${drive} | grep "Serial Number" | awk '{print $3}'`$
    ($
        echo ""$
        echo "########## SMART status report for ${drive} drive (${brand}: ${serial}) ##########"$
        smartctl -H -A -l error /dev/${drive}$
        smartctl -l selftest /dev/${drive} | grep "# 1 \|Num" | cut -c6-$
        echo ""$
        echo ""$
    ) >> ${logfile}$
done$
sed -i '' -e '/smartctl 6.3/d' ${logfile}$
sed -i '' -e '/Copyright/d' ${logfile}$
sed -i '' -e '/=== START OF READ/d' ${logfile}$
sed -i '' -e '/SMART Attributes Data/d' ${logfile}$
sed -i '' -e '/Vendor Specific SMART/d' ${logfile}$
sed -i '' -e '/SMART Error Log Version/d' ${logfile}$
echo "</pre>" >> ${logfile}$
$
### Send report ###$
sendmail -t < ${logfile}$
rm ${logfile}$
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
When I created the RMA. I took a screenshot of the smartctl -a /dev/ada1 output.

Waow, that's interesting. I don't know what's going on here...

Yeah, sure.

That was easy however: you have some lines that are wrapped. I guess you did the copy using the web GUI CLI and/or using another editor than nano. Try to do what I recommend in the Script basics section of my thread to see if it solves the problem ;)
 

MichaelBatz

Dabbler
Joined
Apr 30, 2015
Messages
26
That was easy however: you have some lines that are wrapped. I guess you did the copy using the web GUI CLI and/or using another editor than nano. Try to do what I recommend in the Script basics section of my thread to see if it solves the problem ;)

Don't remember what I did. I have been using it for a long time, and it has been working perfectly until that HDD started failing. But I will check it out!
 

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
One of my drives reported 8 uncorrectable sectors for a few days/weeks then didn't for a few more. It then reported 8 again, then 16 and then it went back!
 

MichaelBatz

Dabbler
Joined
Apr 30, 2015
Messages
26
One of my drives reported 8 uncorrectable sectors for a few days/weeks then didn't for a few more. It then reported 8 again, then 16 and then it went back!

Is this also Seagate?

I have never experienced anything like this with other drives.
 
Last edited:

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
Yes, a Seagate ST4000VN000 ;)
 

MichaelBatz

Dabbler
Joined
Apr 30, 2015
Messages
26
I have been in contact with the retailer of the drive. They have agreed to replace it! Thanks everybody for your answers :)
 
Status
Not open for further replies.
Top