Defective hard disk ?

Status
Not open for further replies.

juanjico

Dabbler
Joined
Sep 18, 2012
Messages
31
HI!

(sorry for my bad english)

I have only one disk on my FreeNAS system. It's using ZFS. But, this is logged constantly:


Aug 4 03:21:59 freenas smartd[2067]: Device: /dev/ada0, 65528 Currently unreadable (pending) sectors
Aug 4 03:22:05 freenas smartd[2067]: Device: /dev/ada0, 65528 Offline uncorrectable sectors
Aug 4 03:51:49 freenas smartd[2067]: Device: /dev/ada0, 65520 Currently unreadable (pending) sectors (changed -8)
Aug 4 03:51:49 freenas smartd[2067]: Device: /dev/ada0, 65520 Offline uncorrectable sectors (changed -8)
Aug 4 04:21:59 freenas smartd[2067]: Device: /dev/ada0, 65520 Currently unreadable (pending) sectors
Aug 4 04:21:59 freenas smartd[2067]: Device: /dev/ada0, 65520 Offline uncorrectable sectors
Aug 5 15:21:48 freenas smartd[2067]: Device: /dev/ada0, 65528 Currently unreadable (pending) sectors
Aug 6 16:51:48 freenas smartd[2067]: Device: /dev/ada0, 65512 Currently unreadable (pending) sectors
Aug 6 16:51:55 freenas smartd[2067]: Device: /dev/ada0, 65512 Offline uncorrectable sectors
Aug 6 17:21:48 freenas smartd[2067]: Device: /dev/ada0, 65520 Currently unreadable (pending) sectors (changed +8)
Aug 6 17:21:48 freenas smartd[2067]: Device: /dev/ada0, 65520 Offline uncorrectable sectors (changed +8)
Aug 7 03:21:59 freenas smartd[2067]: Device: /dev/ada0, 65520 Currently unreadable (pending) sectors
Aug 7 03:22:05 freenas smartd[2067]: Device: /dev/ada0, 65520 Offline uncorrectable sectors
Aug 7 03:51:59 freenas smartd[2067]: Device: /dev/ada0, 65528 Currently unreadable (pending) sectors (changed +8)
Aug 7 03:51:59 freenas smartd[2067]: Device: /dev/ada0, 65528 Offline uncorrectable sectors (changed +8)
Aug 5 15:21:55 freenas smartd[2067]: Device: /dev/ada0, 65528 Offline uncorrectable sectors

But I've checked the hard disk using several tools and all seems to be OK. SMART data shows all OK. No relocated sectors and no error. SMART is perfect.
Why FreeNAS is logging this ?
And, can I add a new hard disk to the system to add redundancy like RAID1 so in case that this hard disk fail, I never lost all my data.
Thanks!
 

cr4sh

Cadet
Joined
Aug 7, 2013
Messages
7
Thanks, already do this. A surface test don't show any errors. I tried nearly every tool on Hirens and no one show a error. Only FreeNAS.
But make full test, not quick. I have few disks that looks ok on quick test, but full show bads.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
But I've checked the hard disk using several tools and all seems to be OK. SMART data shows all OK. No relocated sectors and no error. SMART is perfect.
Why FreeNAS is logging this ?
It is smartd logging the errors.
Code:
smartctl -q noserial -a /dev/ada0
Post the output inside [code][/code] tags.
 

juanjico

Dabbler
Joined
Sep 18, 2012
Messages
31
It is smartd logging the errors.
Code:
smartctl -q noserial -a /dev/ada0
Post the output inside [code][/code] tags.


The output:

Code:
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
 
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
Device Model:     ST2000DM001-9YN164
Firmware Version: CC4H
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Wed Aug  7 22:42:38 2013 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  584) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 226) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   111   099   006    Pre-fail  Always       -       31404008
  3 Spin_Up_Time            0x0003   095   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   037   037   020    Old_age   Always       -       65535
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   075   060   030    Pre-fail  Always       -       36498926
  9 Power_On_Hours          0x0032   092   092   000    Old_age   Always       -       7155
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       47
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       4295032833
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   063   057   045    Old_age   Always       -       37 (2 65 41 0 0)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       37
193 Load_Cycle_Count        0x0032   037   037   000    Old_age   Always       -       126806
194 Temperature_Celsius     0x0022   037   043   000    Old_age   Always       -       37 (128 0 0 0 0)
197 Current_Pending_Sector  0x0012   100   001   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   001   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       232877421760197
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       110694220922854
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       44559423383423
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      7140         -
# 2  Extended offline    Completed without error       00%      7116         -
# 3  Extended offline    Completed without error       00%      7092         -
# 4  Extended offline    Completed without error       00%      7068         -
# 5  Extended offline    Completed without error       00%      7044         -
# 6  Extended offline    Completed without error       00%      7020         -
# 7  Extended offline    Completed without error       00%      6996         -
# 8  Extended offline    Completed without error       00%      6972         -
# 9  Extended offline    Completed without error       00%      6948         -
#10  Extended offline    Completed without error       00%      6927         -
#11  Extended offline    Completed without error       00%      6828         -
#12  Extended offline    Completed without error       00%      6804         -
#13  Extended offline    Completed without error       00%      6780         -
#14  Extended offline    Completed without error       00%      6756         -
#15  Extended offline    Completed without error       00%      6732         -
#16  Extended offline    Completed without error       00%      6708         -
#17  Extended offline    Completed without error       00%      6684         -
#18  Extended offline    Completed without error       00%      6660         -
#19  Extended offline    Completed without error       00%      6636         -
#20  Extended offline    Completed without error       00%      6614         -
#21  Extended offline    Completed without error       00%      6590         -
 
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
The output:
Uggh, stupid new forum. Attributes 197 & 198 all are zero. Hmm, what does smartd.conf look like:
Code:
cat /usr/local/etc/smartd.conf


And, can I add a new hard disk to the system to add redundancy like RAID1 so in case that this hard disk fail, I never lost all my data.
Yes, you need to use the CLI to do it though.
 

juanjico

Dabbler
Joined
Sep 18, 2012
Messages
31
Uggh, stupid new forum. Attributes 197 & 198 all are zero. Hmm, what does smartd.conf look like:
Code:
cat /usr/local/etc/smartd.conf

Code:
################################################
# smartd.conf generated by /etc/rc.d/ix-smartd
################################################
 
/dev/ada0 -n sleep -W 0,0,0 -m xxxx@gmail.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1|2|3|4|5|6|7)/(04)
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Code:
################################################
# smartd.conf generated by /etc/rc.d/ix-smartd
################################################
 
/dev/ada0 -n sleep -W 0,0,0 -m xxxx@gmail.com -s L/(01|02|03|04|05|06|07|08|09|10|11|12)/../(1|2|3|4|5|6|7)/(04)
That looks normal. What happens if you stop the drive from spinning down?
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
How to do this ?
Volume6c.jpeg
 

juanjico

Dabbler
Joined
Sep 18, 2012
Messages
31
More errors has been logged while I'm copying more than 20GB to the disk:

Code:
Aug  8 11:51:48 freenas smartd[2067]: Device: /dev/ada0, 65512 Currently unreadable (pending) sectors
Aug  8 11:51:56 freenas smartd[2067]: Device: /dev/ada0, 65512 Offline uncorrectable sectors


HDD Standby option is 'Always On'.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
That's a sign of a disk that is failing. Both of those means replace the disk. I'd backup your data and do a disk replacement IAW the FreeNAS manual.
 

juanjico

Dabbler
Joined
Sep 18, 2012
Messages
31
That's a sign of a disk that is failing. Both of those means replace the disk. I'd backup your data and do a disk replacement IAW the FreeNAS manual.


But disk is 6 months old and all external tests don't show any problem. Only the smartd test that are made every 30 minutes shows this errors, and not always.

Instead of backup and replace, I want to add a new hard disk to add redundancy (like RAID1). So, if this disk finally fails, I can just extract it and leave the second new disk. Any HOWTO in how do this ?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Define "external tests". Smartd is showing those values because that's the values stored on the hard drives SMART data.

Post the output of smartd -a -q noserial /dev/ada0 in CODE.

Edit: I'm not sure how you plan to add a disk. You might not be able to. Post the output of zpool status in CODE as well.
 

juanjico

Dabbler
Joined
Sep 18, 2012
Messages
31
Define "external tests". Smartd is showing those values because that's the values stored on the hard drives SMART data.

Post the output of smartd -a -q noserial /dev/ada0 in CODE.

Edit: I'm not sure how you plan to add a disk. You might not be able to. Post the output of zpool status in CODE as well.


External test means boot the server using Hirens BootCD and run nearly every disk test. Full surface tests don't show any error. SMART data is perfect. smartd output is already in post #7.

Output of zpool:

Code:
  pool: zfs1
 state: ONLINE
  scan: scrub repaired 0 in 1h7m with 0 errors on Sun Jul  7 01:07:22 2013
config:
 
        NAME                                          STATE     READ WRITE CKSUM
        zfs1                                          ONLINE       0     0     0
          gptid/65e2f2e9-feb7-11e1-975d-009c02a7fa32  ONLINE       0     0     0
 
errors: No known data errors

I have no plans. Is just what I think that is the best solution, but I don't know if this can be done. I don't know if I can add a new disk to a pool like a traditional RAID1 to add redundancy.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
HDD Standby option is 'Always On'.
Forgot to say you need to reboot afterwards.

What's your hardware and how is the drive connected?

Define "external tests". Smartd is showing those values because that's the values stored on the hard drives SMART data.

Post the output of smartd -a -q noserial /dev/ada0 in CODE.
Scroll up.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
as for the SMART errors.. WTF!? That doesn't make any sense at all!

You can add a second disk as a mirror, but only from the command line.
 
Status
Not open for further replies.
Top