Uncrecoverable error, clean report, is it safe to ignore?

Status
Not open for further replies.

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
I got the "One or more devices has experienced an unrecoverable error" few people have posted here. Are there scenarios when it's safe to ignore this error and NOT replace the drive? The drive is a WD 3TB Red and has only been in service for about 3 months.

I'm guessing the error was detected during a scrub? zpool status output shows 2 checksum errors that were repaired. SMART self-test reports no errors and SMART attributes all seem to be within reasonable range compared to my other drives.

Was hoping to get some advice on whether it's OK to not replace this drive? Are there other tests/diags I can run to make sure? Is it possible the error can be caused by something else (like a sudden power outage or an unclean shutdown) and not caused by an imminent drive failure?

The reason I ask about the unclean shutdown is because about a week ago, I had to shut down the server and it hung at the very end when it was syncing the drives so I had to hard power it down. Could that have caused a checksum error?

Anyone have a similar experience and found the drive to be OK?

Thanks

zpool status -x
pool: zpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 12K in 4h2m with 0 errors on Sun May 17 04:02:24 2015
config:

NAME STATE READ WRITE CKSUM
zpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/c8f572be-ba89-11e4-9e40-d0509946c1f0 ONLINE 0 0 0
gptid/c9b51823-ba89-11e4-9e40-d0509946c1f0 ONLINE 0 0 0
gptid/ca707c45-ba89-11e4-9e40-d0509946c1f0 ONLINE 0 0 0
gptid/cb2a0fae-ba89-11e4-9e40-d0509946c1f0 ONLINE 0 0 2
gptid/cb81bbff-ba89-11e4-9e40-d0509946c1f0 ONLINE 0 0 0

errors: No known data errors
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Please give us the output of smartctl -a for each drive (post them between code tags or via pastebin please) and your hardware list so we can help.
 

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
Here are the smartctl results. I didn't post it at first since it's a bit lengthy. The drive that had the checksum error was /dev/da2.

Hardware setup is pretty straightforward.

Motherboard: ASRock C2750D4i Avoton server and using LSI 9240-8i SATA controllers.
Memory: Crucial 16GB (8GBx2) DDR3 PC3-12800 ECC UDIMM

Code:
smartctl -a /dev/da0
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Western Digital Red
Device Model:  WDC WD30EFRX-68EUZN0
Serial Number: <snip>
LU WWN Device Id: 5 0014ee 209505929
Firmware Version: 80.00A80
User Capacity:  3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  5400 rpm
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Mon May 18 10:53:09 2015 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
           was never started.
           Auto Offline Data Collection: Disabled.
Self-test execution status:  (  0)   The previous self-test routine completed
           without error or no self-test has ever
           been run.
Total time to complete Offline
data collection:      (40500) seconds.
Offline data collection
capabilities:         (0x7b) SMART execute Offline immediate.
           Auto Offline data collection on/off support.
           Suspend Offline collection upon new
           command.
           Offline surface scan supported.
           Self-test supported.
           Conveyance Self-test supported.
           Selective Self-test supported.
SMART capabilities:  (0x0003)   Saves SMART data before entering
           power-saving mode.
           Supports SMART auto save timer.
Error logging capability:  (0x01)   Error logging supported.
           General Purpose Logging supported.
Short self-test routine
recommended polling time:     (  2) minutes.
Extended self-test routine
recommended polling time:     ( 406) minutes.
Conveyance self-test routine
recommended polling time:     (  5) minutes.
SCT capabilities:     (0x703d)   SCT Status supported.
           SCT Error Recovery Control supported.
           SCT Feature Control supported.
           SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  0
  3 Spin_Up_Time  0x0027  176  175  021  Pre-fail  Always  -  6200
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  17
  5 Reallocated_Sector_Ct  0x0033  200  200  140  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  200  200  000  Old_age  Always  -  0
  9 Power_On_Hours  0x0032  098  098  000  Old_age  Always  -  2016
10 Spin_Retry_Count  0x0032  100  253  000  Old_age  Always  -  0
11 Calibration_Retry_Count 0x0032  100  253  000  Old_age  Always  -  0
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  17
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  16
193 Load_Cycle_Count  0x0032  192  192  000  Old_age  Always  -  26971
194 Temperature_Celsius  0x0022  121  114  000  Old_age  Always  -  29
196 Reallocated_Event_Count 0x0032  200  200  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  100  253  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x0008  100  253  000  Old_age  Offline  -  0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



Code:
smartctl -a /dev/da1
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Western Digital Red
Device Model:  WDC WD30EFRX-68EUZN0
Serial Number:  <snip>
LU WWN Device Id: 5 0014ee 25f3d6e3a
Firmware Version: 80.00A80
User Capacity:  3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  5400 rpm
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Mon May 18 10:55:04 2015 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
           was never started.
           Auto Offline Data Collection: Disabled.
Self-test execution status:  (  0)   The previous self-test routine completed
           without error or no self-test has ever
           been run.
Total time to complete Offline
data collection:      (39900) seconds.
Offline data collection
capabilities:         (0x7b) SMART execute Offline immediate.
           Auto Offline data collection on/off support.
           Suspend Offline collection upon new
           command.
           Offline surface scan supported.
           Self-test supported.
           Conveyance Self-test supported.
           Selective Self-test supported.
SMART capabilities:  (0x0003)   Saves SMART data before entering
           power-saving mode.
           Supports SMART auto save timer.
Error logging capability:  (0x01)   Error logging supported.
           General Purpose Logging supported.
Short self-test routine
recommended polling time:     (  2) minutes.
Extended self-test routine
recommended polling time:     ( 400) minutes.
Conveyance self-test routine
recommended polling time:     (  5) minutes.
SCT capabilities:     (0x703d)   SCT Status supported.
           SCT Error Recovery Control supported.
           SCT Feature Control supported.
           SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  0
  3 Spin_Up_Time  0x0027  180  180  021  Pre-fail  Always  -  5983
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  16
  5 Reallocated_Sector_Ct  0x0033  200  200  140  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  200  200  000  Old_age  Always  -  0
  9 Power_On_Hours  0x0032  098  098  000  Old_age  Always  -  2016
10 Spin_Retry_Count  0x0032  100  253  000  Old_age  Always  -  0
11 Calibration_Retry_Count 0x0032  100  253  000  Old_age  Always  -  0
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  16
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  15
193 Load_Cycle_Count  0x0032  200  200  000  Old_age  Always  -  6
194 Temperature_Celsius  0x0022  121  114  000  Old_age  Always  -  29
196 Reallocated_Event_Count 0x0032  200  200  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  100  253  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x0008  100  253  000  Old_age  Offline  -  0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Code:
smartctl -a /dev/da2
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Western Digital Red
Device Model:  WDC WD30EFRX-68EUZN0
Serial Number:  <snip>
LU WWN Device Id: 5 0014ee 209e76210
Firmware Version: 80.00A80
User Capacity:  3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  5400 rpm
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Mon May 18 10:55:47 2015 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
           was never started.
           Auto Offline Data Collection: Disabled.
Self-test execution status:  (  0)   The previous self-test routine completed
           without error or no self-test has ever
           been run.
Total time to complete Offline
data collection:      (38760) seconds.
Offline data collection
capabilities:         (0x7b) SMART execute Offline immediate.
           Auto Offline data collection on/off support.
           Suspend Offline collection upon new
           command.
           Offline surface scan supported.
           Self-test supported.
           Conveyance Self-test supported.
           Selective Self-test supported.
SMART capabilities:  (0x0003)   Saves SMART data before entering
           power-saving mode.
           Supports SMART auto save timer.
Error logging capability:  (0x01)   Error logging supported.
           General Purpose Logging supported.
Short self-test routine
recommended polling time:     (  2) minutes.
Extended self-test routine
recommended polling time:     ( 389) minutes.
Conveyance self-test routine
recommended polling time:     (  5) minutes.
SCT capabilities:     (0x703d)   SCT Status supported.
           SCT Error Recovery Control supported.
           SCT Feature Control supported.
           SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  1
  3 Spin_Up_Time  0x0027  176  176  021  Pre-fail  Always  -  6166
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  17
  5 Reallocated_Sector_Ct  0x0033  200  200  140  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  200  200  000  Old_age  Always  -  0
  9 Power_On_Hours  0x0032  098  098  000  Old_age  Always  -  2016
10 Spin_Retry_Count  0x0032  100  253  000  Old_age  Always  -  0
11 Calibration_Retry_Count 0x0032  100  253  000  Old_age  Always  -  0
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  17
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  16
193 Load_Cycle_Count  0x0032  200  200  000  Old_age  Always  -  5
194 Temperature_Celsius  0x0022  120  114  000  Old_age  Always  -  30
196 Reallocated_Event_Count 0x0032  200  200  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  100  253  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x0008  100  253  000  Old_age  Offline  -  0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline  Completed without error  00%  2014  -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



Code:
smartctl -a /dev/da3
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Western Digital Red
Device Model:  WDC WD30EFRX-68EUZN0
Serial Number:  <snip>
LU WWN Device Id: 5 0014ee 2b3fbedec
Firmware Version: 80.00A80
User Capacity:  3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  5400 rpm
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Mon May 18 10:56:23 2015 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
           was never started.
           Auto Offline Data Collection: Disabled.
Self-test execution status:  (  0)   The previous self-test routine completed
           without error or no self-test has ever
           been run.
Total time to complete Offline
data collection:      (39240) seconds.
Offline data collection
capabilities:         (0x7b) SMART execute Offline immediate.
           Auto Offline data collection on/off support.
           Suspend Offline collection upon new
           command.
           Offline surface scan supported.
           Self-test supported.
           Conveyance Self-test supported.
           Selective Self-test supported.
SMART capabilities:  (0x0003)   Saves SMART data before entering
           power-saving mode.
           Supports SMART auto save timer.
Error logging capability:  (0x01)   Error logging supported.
           General Purpose Logging supported.
Short self-test routine
recommended polling time:     (  2) minutes.
Extended self-test routine
recommended polling time:     ( 394) minutes.
Conveyance self-test routine
recommended polling time:     (  5) minutes.
SCT capabilities:     (0x703d)   SCT Status supported.
           SCT Error Recovery Control supported.
           SCT Feature Control supported.
           SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  0
  3 Spin_Up_Time  0x0027  178  178  021  Pre-fail  Always  -  6083
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  17
  5 Reallocated_Sector_Ct  0x0033  200  200  140  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  200  200  000  Old_age  Always  -  0
  9 Power_On_Hours  0x0032  098  098  000  Old_age  Always  -  2015
10 Spin_Retry_Count  0x0032  100  253  000  Old_age  Always  -  0
11 Calibration_Retry_Count 0x0032  100  253  000  Old_age  Always  -  0
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  17
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  16
193 Load_Cycle_Count  0x0032  192  192  000  Old_age  Always  -  26954
194 Temperature_Celsius  0x0022  119  113  000  Old_age  Always  -  31
196 Reallocated_Event_Count 0x0032  200  200  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  100  253  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x0008  100  253  000  Old_age  Offline  -  0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Code:
smartctl -a /dev/da4
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Western Digital Red
Device Model:  WDC WD30EFRX-68EUZN0
Serial Number:  <snip>
LU WWN Device Id: 5 0014ee 2b3fbe695
Firmware Version: 80.00A80
User Capacity:  3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  5400 rpm
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Mon May 18 10:57:46 2015 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
           was never started.
           Auto Offline Data Collection: Disabled.
Self-test execution status:  (  0)   The previous self-test routine completed
           without error or no self-test has ever
           been run.
Total time to complete Offline
data collection:      (38580) seconds.
Offline data collection
capabilities:         (0x7b) SMART execute Offline immediate.
           Auto Offline data collection on/off support.
           Suspend Offline collection upon new
           command.
           Offline surface scan supported.
           Self-test supported.
           Conveyance Self-test supported.
           Selective Self-test supported.
SMART capabilities:  (0x0003)   Saves SMART data before entering
           power-saving mode.
           Supports SMART auto save timer.
Error logging capability:  (0x01)   Error logging supported.
           General Purpose Logging supported.
Short self-test routine
recommended polling time:     (  2) minutes.
Extended self-test routine
recommended polling time:     ( 387) minutes.
Conveyance self-test routine
recommended polling time:     (  5) minutes.
SCT capabilities:     (0x703d)   SCT Status supported.
           SCT Error Recovery Control supported.
           SCT Feature Control supported.
           SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  0
  3 Spin_Up_Time  0x0027  175  174  021  Pre-fail  Always  -  6250
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  23
  5 Reallocated_Sector_Ct  0x0033  200  200  140  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  200  200  000  Old_age  Always  -  0
  9 Power_On_Hours  0x0032  095  095  000  Old_age  Always  -  3693
10 Spin_Retry_Count  0x0032  100  253  000  Old_age  Always  -  0
11 Calibration_Retry_Count 0x0032  100  253  000  Old_age  Always  -  0
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  23
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  17
193 Load_Cycle_Count  0x0032  199  199  000  Old_age  Always  -  5826
194 Temperature_Celsius  0x0022  123  111  000  Old_age  Always  -  27
196 Reallocated_Event_Count 0x0032  200  200  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  100  253  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x0008  100  253  000  Old_age  Offline  -  0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Last edited:

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Well, you didn't setup some scheduled SMART tests so these drives were never scanned for errors. It's strongly recommended to do a short test at least every week and a long test at least every month. See the tasks section in the manual to add the tests.
 

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
Well, you didn't setup some scheduled SMART tests so these drives were never scanned for errors. It's strongly recommended to do a short test at least every week and a long test at least every month. See the tasks section in the manual to add the tests.


Thanks Bidule0hm, I will add the checks going forward. I did run a short test for the drive with the checksum errors (da2) and it returned no errors:
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 2014 -

I guess my question is still the same: Is this enough to say that the disk is safe to continue using for now until future SMART self-tests tell me otherwise or is this drive in imminent danger of a complete failure?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Short tests are probably close to meaningless. I only run them regularly because they're "free".

If a long test reveals no further problems, I wouldn't worry too much. Just a Raw Read Error Rate at 1 isn't anything special. One of my Reds has been floating around 1-2 ever since I burned them in and everything's moving along nicely.
Of course, it's always good to have a spare available, preferably burned-in.

Also: ada3 and ada4 need their idle timers fixed. They seem to be from the bad batch that had it set to 8 seconds. You should run wdidle on them and set the timer to 300 seconds.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Short isn't very useful, you need to execute a long test to be sure (it'll take several hours but you can do all drives simultaneously and you can still use the NAS so it's not a big deal) :)

From what I've seen for now you're ok but only the long test will confirm it ;)
 

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
Thanks everyone.

@Ericloewe, I did end up running the long test and da4 generated an error. The rest of the drives returned no errors.

Code:
[root@nas01] ~# smartctl -a /dev/da4
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Western Digital Red
Device Model:  WDC WD30EFRX-68EUZN0
Serial Number:  <snip>
LU WWN Device Id: 5 0014ee 2b3fbe695
Firmware Version: 80.00A80
User Capacity:  3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  5400 rpm
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Mon May 25 15:33:08 2015 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)   Offline data collection activity
           was never started.
           Auto Offline Data Collection: Disabled.
Self-test execution status:  ( 113)   The previous self-test completed having
           the read element of the test failed.
Total time to complete Offline
data collection:      (38580) seconds.
Offline data collection
capabilities:         (0x7b) SMART execute Offline immediate.
           Auto Offline data collection on/off support.
           Suspend Offline collection upon new
           command.
           Offline surface scan supported.
           Self-test supported.
           Conveyance Self-test supported.
           Selective Self-test supported.
SMART capabilities:  (0x0003)   Saves SMART data before entering
           power-saving mode.
           Supports SMART auto save timer.
Error logging capability:  (0x01)   Error logging supported.
           General Purpose Logging supported.
Short self-test routine
recommended polling time:     (  2) minutes.
Extended self-test routine
recommended polling time:     ( 387) minutes.
Conveyance self-test routine
recommended polling time:     (  5) minutes.
SCT capabilities:     (0x703d)   SCT Status supported.
           SCT Error Recovery Control supported.
           SCT Feature Control supported.
           SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x002f  200  200  051  Pre-fail  Always  -  0
  3 Spin_Up_Time  0x0027  175  174  021  Pre-fail  Always  -  6250
  4 Start_Stop_Count  0x0032  100  100  000  Old_age  Always  -  23
  5 Reallocated_Sector_Ct  0x0033  200  200  140  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x002e  200  200  000  Old_age  Always  -  0
  9 Power_On_Hours  0x0032  095  095  000  Old_age  Always  -  3865
 10 Spin_Retry_Count  0x0032  100  253  000  Old_age  Always  -  0
 11 Calibration_Retry_Count 0x0032  100  253  000  Old_age  Always  -  0
 12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  23
192 Power-Off_Retract_Count 0x0032  200  200  000  Old_age  Always  -  17
193 Load_Cycle_Count  0x0032  199  199  000  Old_age  Always  -  5826
194 Temperature_Celsius  0x0022  122  111  000  Old_age  Always  -  28
196 Reallocated_Event_Count 0x0032  200  200  000  Old_age  Always  -  0
197 Current_Pending_Sector  0x0032  200  200  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0030  100  253  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x0032  200  200  000  Old_age  Always  -  0
200 Multi_Zone_Error_Rate  0x0008  200  200  000  Old_age  Offline  -  1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline  Completed: read failure  10%  3752  1334243920

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



Could this be caused by the idle timer bug you mentioned or is this a real read failure? If it is, I'll most likely return the drive since it's still under warranty.

Also, to fix the idle for ada3 and ada4, WD doesn't seem to have a version for FreeBSD. Will I need to remove the drives and update the timer on a DOS/Windows machine and then put the drives back in? Or is there an easier way to do it?

Thanks

Short tests are probably close to meaningless. I only run them regularly because they're "free".

If a long test reveals no further problems, I wouldn't worry too much. Just a Raw Read Error Rate at 1 isn't anything special. One of my Reds has been floating around 1-2 ever since I burned them in and everything's moving along nicely.
Of course, it's always good to have a spare available, preferably burned-in.

Also: ada3 and ada4 need their idle timers fixed. They seem to be from the bad batch that had it set to 8 seconds. You should run wdidle on them and set the timer to 300 seconds.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, the SMART test failed, that's a pretty solid indication that it's going downhill.
 

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
Definitely! I meant to say that my original post was whether to replace da2, which FreeNAS reported as having 2 checksum errors. Since SMART came back with no errors for that one, I'll just keep that drive and RMA da4.

Thanks for everyone's feedback and assistance.
 
Status
Not open for further replies.
Top