Defective hard disk ?

Status
Not open for further replies.

juanjico

Dabbler
Joined
Sep 18, 2012
Messages
31
Forgot to say you need to reboot afterwards.

What's your hardware and how is the drive connected?

It's a HP MicroServer, and disk is connected directly to a SATA port.

I'll try to reboot the server and see what happen.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
as for the SMART errors.. WTF!? That doesn't make any sense at all!
Strange, no?

It's a HP MicroServer, and disk is connected directly to a SATA port.
Actually, do a shutdown and try a different SATA port. If you have another SATA cable you can try changing that as well.
Code:
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       4295032833
That should be zero. I have a few ST2000DM001-9YN164, but with CC4B firmware. They all properly show zero for 188 Command_Timeout.
 

Z300M

Guru
Joined
Sep 9, 2011
Messages
882
Ops! OK, I changed it to Always on.

This is what I have before:
Captura1.PNG
I made a mistake, and I have deleted what I had written
 

juanjico

Dabbler
Joined
Sep 18, 2012
Messages
31
I've rebooted the server and swapped the drive with other bay to change the SATA port.

While I was copying a lot of files to the server, I've received the fatal email:

Code:
Aug 12 22:26:02 freenas smartd[2090]: Device: /dev/ada0, 65528 Currently unreadable (pending) sectors
Aug 12 22:26:09 freenas smartd[2090]: Device: /dev/ada0, 65528 Offline uncorrectable sectors


But, while I was copying the files, the copy dialog hanged about 10 seconds (and the disk light on the server was on during that time) and then continued the copy operation. Then I've received the email with the SMART errors.

Then, I logged to the server, and run the smartcrl command:

Code:
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
 
=== START OF INFORMATION SECTION ===
Model Family:    Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
Device Model:    ST2000DM001-9YN164
Firmware Version: CC4H
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:    512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Mon Aug 12 22:31:30 2013 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  584) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (  1) minutes.
Extended self-test routine
recommended polling time:        ( 226) minutes.
Conveyance self-test routine
recommended polling time:        (  2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  118  099  006    Pre-fail  Always      -      171232744
  3 Spin_Up_Time            0x0003  099  094  000    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  037  037  020    Old_age  Always      -      65535
  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000f  075  060  030    Pre-fail  Always      -      36939868
  9 Power_On_Hours          0x0032  092  092  000    Old_age  Always      -      7275
10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      48
183 Runtime_Bad_Block      0x0032  100  100  000    Old_age  Always      -      0
184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0
187 Reported_Uncorrect      0x0032  100  100  000    Old_age  Always      -      0
188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -      4295032833
189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0
190 Airflow_Temperature_Cel 0x0022  064  057  045    Old_age  Always      -      36 (2 65 38 31 0)
191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      37
193 Load_Cycle_Count        0x0032  037  037  000    Old_age  Always      -      127118
194 Temperature_Celsius    0x0022  036  043  000    Old_age  Always      -      36 (128 0 0 0 0)
197 Current_Pending_Sector  0x0012  001  001  000    Old_age  Always      -      65520
198 Offline_Uncorrectable  0x0010  001  001  000    Old_age  Offline      -      65520
199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0
240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      196164041314101
241 Total_LBAs_Written      0x0000  100  253  000    Old_age  Offline      -      112737015761306
242 Total_LBAs_Read        0x0000  100  253  000    Old_age  Offline      -      44568031752441
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error      00%      7260        -
# 2  Extended offline    Completed without error      00%      7236        -
# 3  Extended offline    Completed without error      00%      7212        -
# 4  Extended offline    Completed without error      00%      7188        -
# 5  Extended offline    Completed without error      00%      7164        -
# 6  Extended offline    Completed without error      00%      7140        -
# 7  Extended offline    Completed without error      00%      7116        -
# 8  Extended offline    Completed without error      00%      7092        -
# 9  Extended offline    Completed without error      00%      7068        -
#10  Extended offline    Completed without error      00%      7044        -
#11  Extended offline    Completed without error      00%      7020        -
#12  Extended offline    Completed without error      00%      6996        -
#13  Extended offline    Completed without error      00%      6972        -
#14  Extended offline    Completed without error      00%      6948        -
#15  Extended offline    Completed without error      00%      6927        -
#16  Extended offline    Completed without error      00%      6828        -
#17  Extended offline    Completed without error      00%      6804        -
#18  Extended offline    Completed without error      00%      6780        -
#19  Extended offline    Completed without error      00%      6756        -
#20  Extended offline    Completed without error      00%      6732        -
#21  Extended offline    Completed without error      00%      6708        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Entries 197 & 198 now shows 65520 as value.

This is totally random?
 

juanjico

Dabbler
Joined
Sep 18, 2012
Messages
31
I rerun the smartctl command, and now shows values as 0 !!!

Code:
197 Current_Pending_Sector  0x0012   100   001   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   001   000    Old_age   Offline      -       0

What is happening here ???
The value varies between 65520 and 65528.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Not a clue, but I'd be replacing that drive ASAP. Something is VERY wrong with it.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Not a clue, but I'd be replacing that drive ASAP. Something is VERY wrong with it.
Agreed.

You can also check with with e.g. GParted Live, it includes smartctl, just to see if it's any different.
 

juanjico

Dabbler
Joined
Sep 18, 2012
Messages
31
I'm finally replacing the disk. I'm starting to hear some 'clicks' from the disk. So, I've powered off the server.

Can I just start the server with a livecd and do a 'dd' over the old and new disk ? This will work with ZFS ? The FreeNAS system will recognize the new disk as the old disk ?

And, there is a way to, instead of use a 2TB as a new disk, use a 3TB disk as the new disk ? Well, if I do the 'dd' trick, i'll lose 1TB of space. So, there is a way to upgrade from a 2TB to a 3TB disk ?

Thanks!!!
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Can I just start the server with a livecd and do a 'dd' over the old and new disk ?
Don't use dd on a failing disk. Normally, you should attach the new disk to create a mirror, but running ddrescue, NOT dd, will likely be less stressful. That is assuming the pool has a lot of data.

And, there is a way to, instead of use a 2TB as a new disk, use a 3TB disk as the new disk ?
First, partition the new disk.
Code:
gpart create -s gpt ada0

gpart add -b 128 -i 1 -t freebsd-swap -s 2G ada0

gpart add -i 2 -t freebsd-zfs ada0
Followed by ddrescue of the ZFS partition, not the whole drive, of the original disk to the ZFS partition on the new disk, e.g. if=axdap2 of=axdbp2. Be absolutely certain which drive is which. Check the serials first. If the clone is successful the pool will auto-expand on import or you simply need to online it, zpool online -e.
 

juanjico

Dabbler
Joined
Sep 18, 2012
Messages
31
Thanks @paleoN

The new drive will arrive tomorrow. I'll report back here the results.
 

juanjico

Dabbler
Joined
Sep 18, 2012
Messages
31
@paleoN

How did you find the names axdap2 and axdbp2 for ddrescue ? I'm getting

Code:
ddrescue: Can't open input file: No such file or directory

 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
How did you find the names axdap2 and axdbp2 for ddrescue ?
Lose the space before punctuation.

The axdXp2 are specifically nonexistent devices in case someone was to blindly copy/paste the example. There is no way to tell what order the drives will come up ahead of time. Both dd & ddrescue operate at the block level. If you mix up the input & output they will destroy your existing data.
Code:
camcontrol devlist

gpart show
 
Status
Not open for further replies.
Top