1 Currently unreadable (pending) sectors

twisted · Sep 25, 2014

Hi All,

Thanks in advance for any advice anyone can offer.

I am getting this error repeat every 30 minutes (1 email alert but none since..I assume because its not getting worse).

Sep 25 22:30:33 freenas smartd[2371]: Device: /dev/ada3, 1 Currently unreadable (pending) sectors

I'm running smartctl -t long /dev/ada3. I guess its still running (do the results appear somewhere eventually? :)). Is this something to be concerned about?

smartctl -A /dev/ada3 output below. I've got a really LOW Start_Stop_Count which suggests my drives aren't going to sleep? Not sure if that can be a problem.

Code:

[root@freenas] ~# smartctl -A /dev/ada3
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       6
  3 Spin_Up_Time            0x0027   185   175   021    Pre-fail  Always       -       5741
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       20
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   089   089   000    Old_age   Always       -       8302
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       18
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       6
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1725
194 Temperature_Celsius     0x0022   114   109   000    Old_age   Always       -       36
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       1

Should I be worried about this? :)

Ericloewe · Sep 25, 2014

I'd recommend you start making arrangements for a new drive, since that one is throwing bad sectors.

Did you test it before putting it into production?

Also, it reached 41°C at least once, so make sure you have proper cooling.

warri · Sep 25, 2014

A long SMART test usually takes a couple of hours. You can see the recommended polling time and eventually the results with:

Code:

smartctl -a /dev/adaX

Example output:

Code:

[...]
Extended self-test routine
recommended polling time:     ( 447) minutes.
[...]
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      9289         -
[...]

One pending sector usually is not critical, but may be an indicator for a failing drive. I can see other signs of a failing drive in your SMART values, e.g. raw read errors and a multizone read error.

If you have sufficient redundancy (e.g. RAID-Z2) you could keep the disk running and regularly check if the SMART values get worse - and take appropriate action by replacing the drive in the future. If you are only running a RAID-Z1 however, I'd replace the disk as soon as possible.

EDIT:

Did you test it before putting it into production?

According to the Power_On_Hours, that was roughly a year ago. I guess it didn't show signs of failure in the beginning, if the OP only started to get notifications now.

twisted · Sep 25, 2014

OK yes you're right thanks smartctl -a showed the test result.

Code:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       60%      8304         2400065936

I might try what Joshua has put in this post to see if it fixes the bad sector http://forums.freenas.org/index.php?threads/1-currently-unreadable-pending-sectors.10213/#post-45550. I'm not 100% sure what that does yet so I might have to find out first before I run it :)

I guess I'll watch it over time and see if it gets worse as well. Just 1 of the 4 drives, I'll get a replacement ready. I'm running RAID10 and the other drives look OK.

Code:

[root@freenas] ~# smartctl -A /dev/ada0 | grep Error
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
[root@freenas] ~# smartctl -A /dev/ada1 | grep Error
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
[root@freenas] ~# smartctl -A /dev/ada2 | grep Error
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0
[root@freenas] ~# smartctl -A /dev/ada3 | grep Error
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       6
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       5

warri · Sep 25, 2014

Additionally to your pending sector the long SMART test failed. This is a very strong indicator that you should replace the drive immediately.

esamett · Sep 26, 2014

Hello: I just replaced bad hard drive with unit that was in service on my PC that was considered "good." My pool is currently resilvering when I got email message:
"Currently unreadable (pending) sectors"
Have I installed a bad drive or is this normal during resilver process?

Thanks.

ps: I jumped onto this thread since I thought it matched my current issue.

Yatti420 · Sep 27, 2014

Post the result of smartctl -a on the disk in question..

Sent from my SGH-I257M using Tapatalk 2

esamett · Sep 28, 2014

Thank you.

Code:

smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p10 amd64] (local build)

Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org



=== START OF READ SMART DATA SECTION ===

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_

FAILED RAW_VALUE

  1 Raw_Read_Error_Rate  0x002f  100  100  051  Pre-fail  Always  -

  4547

  2 Throughput_Performance  0x0026  252  252  000  Old_age  Always  -

  0

  3 Spin_Up_Time  0x0023  067  066  025  Pre-fail  Always  -

  10121

  4 Start_Stop_Count  0x0032  098  098  000  Old_age  Always  -

  2637

  5 Reallocated_Sector_Ct  0x0033  252  252  010  Pre-fail  Always  -

  0

  7 Seek_Error_Rate  0x002e  252  252  051  Old_age  Always  -

  0

  8 Seek_Time_Performance  0x0024  252  252  015  Old_age  Offline  -

  0

  9 Power_On_Hours  0x0032  100  100  000  Old_age  Always  -
  18196
10 Spin_Retry_Count  0x0032  252  252  051  Old_age  Always  -
  0
11 Calibration_Retry_Count 0x0032  252  252  000  Old_age  Always  -
  0
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -
  415
181 Program_Fail_Cnt_Total  0x0022  094  094  000  Old_age  Always  -
  136493158
191 G-Sense_Error_Rate  0x0022  100  100  000  Old_age  Always  -
  7506
192 Power-Off_Retract_Count 0x0022  252  252  000  Old_age  Always  -
  0
194 Temperature_Celsius  0x0002  064  043  000  Old_age  Always  -
  32 (Min/Max 16/57)
195 Hardware_ECC_Recovered  0x003a  100  100 000  Old_age  Always  -
  0
196 Reallocated_Event_Count 0x0032  252  252  000  Old_age  Always  -
  0
197 Current_Pending_Sector  0x0032  100  100  000  Old_age  Always  -
  1
198 Offline_Uncorrectable  0x0030  252  252  000  Old_age  Offline  -
  0
199 UDMA_CRC_Error_Count  0x0036  200  200  000  Old_age  Always  -
  0
200 Multi_Zone_Error_Rate  0x002a  100  100  000  Old_age  Always  -
  7619
223 Load_Retry_Count  0x0032  252  252  000  Old_age  Always  -
  0
225 Load_Cycle_Count  0x0032  100  100  000  Old_age  Always  -
  2719
[root@freenas ~]#

I found this posting on correcting the problem:

Code:

I·  Going to outline my steps here for future refrence..
Run a long selftest on the disk ada#, it should fail somewhere

Code (text):


smartctl -t long /dev/ada#
check the smart information for the unreadable sector, lets call it 'X'

Code (text):


smartctl -A /dev/ada#  ;READS RESULTS ? EJS a = all or A = attributes??
change the syscontrol and try writing to the sector. Change the 'X' below

Code (text):


sysctl kern.geom.debugflags=16
dd if=/dev/zero of=/dev/ada# bs=4096 count=1 seek=X conv=noerror,sync
check the smart information to see if 'Current_Pending_Sector' went to 0, you may need to repeat the steps above multiple times if there are multiple unreadable sectors..

Code (text):


smartctl -A /dev/ada#
Now run another smart test and hopefully it can complete without error.

Code (text):


smartctl -t long /dev/ada#
smartctl -A /dev/ada#
Now run a scrub (either from the gui or with 'zpool scrub poolname').
Check the scrub's status and hopefully it fixes some errors.

Code (text):


zpool status -v poolname

but don't think it did the trick for me after one iteration:

Code:

SMART Self-test log structure revision number 1

Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA

_of_first_error

# 1  Extended offline  Completed: read failure  30%  18201  261

4945456

# 2  Short offline  Completed without error  00%  1310  -



SMART Selective self-test log data structure revision number 0

Note: revision number not 1 implies that no selective self-test has ever been ru

n

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

  1  0  0  Completed_read_failure [30% left] (0-65535)

  2  0  0  Not_testing

  3  0  0  Not_testing

  4  0  0  Not_testing

  5  0  0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

--More--(byte 5651)


[root@freenas ~]# sysctl kern.geom.debugflags=16
kern.geom.debugflags: 0 -> 16
[root@freenas ~]# dd if=/dev/zero of=/dev/ada0 bs=4096 count=1 seek=2614945456 c
onv=noerror,sync
dd: /dev/ada0: Input/output error
1+0 records in
0+0 records out
0 bytes transferred in 0.000090 secs (0 bytes/sec)
[root@freenas ~]#

I then read that writing "0" to all sectors would force the disk to remap funky sectors. I did this with Parted Magic. Long test showed "no errors." I reinstalled drive in my server. This is the smartctl -a without re-running the long test. I still see the pending sector entry:

Code:



smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family:  SAMSUNG SpinPoint F4 EG (AF)
Device Model:  SAMSUNG HD204UI
Serial Number:  S2H7JD2ZB08331
LU WWN Device Id: 5 0024e9 004508109
Firmware Version: 1AQ10001
User Capacity:  2,000,398,934,016 bytes [2.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:  5400 rpm
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 6
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:  Sun Sep 28 08:39:22 2014 PDT
==> WARNING: Using smartmontools or hdparm with this
drive may result in data loss due to a firmware bug.
****** THIS DRIVE MAY OR MAY NOT BE AFFECTED! ******
Buggy and fixed firmware report same version number!
See the following web pages for details:
http://knowledge.seagate.com/articles/en_US/FAQ/223571en
http://sourceforge.net/apps/trac/smartmontools/wiki/SamsungF4EGBadBlocks
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever
been run.
Total time to complete Offline
data collection:  (20580) seconds.
Offline data collection
capabilities:  (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off supp
ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  2) minutes.
Extended self-test routine
recommended polling time:  ( 343) minutes.
SCT capabilities:  (0x003f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate  0x002f  100  100  051  Pre-fail  Always  -
  4547
2 Throughput_Performance  0x0026  055  055  000  Old_age  Always  -
  18806
3 Spin_Up_Time  0x0023  067  066  025  Pre-fail  Always  -
10082
4 Start_Stop_Count  0x0032  098  098  000  Old_age  Always  -
  2641
5 Reallocated_Sector_Ct  0x0033  252  252  010  Pre-fail  Always  -
  0
7 Seek_Error_Rate  0x002e  252  252  051  Old_age  Always  -
  0
8 Seek_Time_Performance  0x0024  252  252  015  Old_age  Offline  -
  0
9 Power_On_Hours  0x0032  100  100  000  Old_age  Always  -
  18220
10 Spin_Retry_Count  0x0032  252  252  051  Old_age  Always  -
  0
11 Calibration_Retry_Count 0x0032  252  252  000  Old_age  Always  -
  0
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -
  419
181 Program_Fail_Cnt_Total  0x0022  094  094  000  Old_age  Always  -
  136493310
191 G-Sense_Error_Rate  0x0022  100  100  000  Old_age  Always  -
  7506
192 Power-Off_Retract_Count 0x0022  252  252  000  Old_age  Always  -
  0
194 Temperature_Celsius  0x0002  064  043  000  Old_age  Always  -
  31 (Min/Max 16/57)
195 Hardware_ECC_Recovered  0x003a  100  100  000  Old_age  Always  -
  0
196 Reallocated_Event_Count 0x0032  252  252  000  Old_age  Always  -
  0
197 Current_Pending_Sector  0x0032  252  100  000  Old_age  Always  -
  0
198 Offline_Uncorrectable  0x0030  252  252  000  Old_age  Offline  -
  0
199 UDMA_CRC_Error_Count  0x0036  200  200  000  Old_age  Always  -
  0
200 Multi_Zone_Error_Rate  0x002a  100  100  000  Old_age  Always  -
  7620
223 Load_Retry_Count  0x0032  252  252  000  Old_age  Always  -
  0
225 Load_Cycle_Count  0x0032  100  100  000  Old_age  Always  -
  2723
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA
_of_first_error
# 1  Extended offline  Completed without error  00%  18214  -
# 2  Extended offline  Completed: read failure  30%  18201  261
4945456
# 3  Short offline  Completed without error  00%  1310  -
1 of 1 failed self-tests are outdated by newer successful extended offline self-
test # 1
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been ru
n
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
1  0  0  Completed [00% left] (0-65535)
2  0  0  Not_testing
3  0  0  Not_testing
4  0  0  Not_testing
5  0  0  Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[root@freenas ~]#

I will run tests from FreeNAS and post.

evan

esamett · Sep 28, 2014

after long smart test:

Important Announcement for the TrueNAS Community.

1 Currently unreadable (pending) sectors

twisted

Dabbler

Ericloewe

Server Wrangler

warri

Guru

twisted

Dabbler

warri

Guru

esamett

Patron

Yatti420

Wizard

esamett

Patron

esamett

Patron

Similar threads