Quick question about HDD testing and SMART conveyance test

Status
Not open for further replies.

BlazeStar

Patron
Joined
Apr 6, 2014
Messages
383
Using FreeNAS-9.2.1.3-RELEASE-x64

I have a 3 TB SATA HDD that was considered "bad" but I have reasons to believe that it was the controller card of the computer it came from that was bad.

Before using it, I'd like to make sure it's good.

I read:
This: http://doc.freenas.org/index.php/S.M.A.R.T.
A good part of this: http://smartmontools.sourceforge.net/man/smartd.8.html
And this very interesting thread: http://forums.freenas.org/index.php?threads/testing-hard-drives.15436/

My question is:

Do you think running a conveyance test and getting positive results would be reliable enough for me to consider the HDD as good?

I also read about that the "Long/Extended" test takes roughly 1 minute per gigabyte... quickly, this would mean over 50 hours for a 3 Tb drive... so a little bit over 2 days...

I don't mind doing it, but everywhere I read, it only speaks about the conveyance test.

Anyhow, in conclusion, what should I do to make sure the drive is good (with SMART or not, btw)


Thanks!!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
50 hours for a 3TB? hogwash.

I bet it's less than 12 hours since I've never seen a 4TB drive take that long. If you look at the smartctl -a data on your disk it tells you exactly how many minutes it takes to complete a test. Typical speeds are 6-9 hours for 3-4TB drives.

Conveyance is wholly inadequate for your needs. I'd consider your disk good only if all smart data on the disk is good, badblocks for a few passes finds no problems, and a long test finishes without errors.
 

BlazeStar

Patron
Joined
Apr 6, 2014
Messages
383
Conveyance is wholly inadequate for your needs. I'd consider your disk good only if all smart data on the disk is good, badblocks for a few passes finds no problems, and a long test finishes without errors.

So therefore, if I run those:
Code:
badblocks -wvs /dev/XXX
smartctl --test=long /dev/XXX


And everything comes out nice, I'm all good right?

(I don't mind loosing the data, thus the badblocks -w)

I'm not sure what you mean by "if all smart data on the disk is good" though?

Thanks!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
what I meant was do smartctl -a /dev/xxx and look at the info and see if there's any indicators of a failing disk.
 

BlazeStar

Patron
Joined
Apr 6, 2014
Messages
383
Back to this, I finally got access to the drive I wanted to test.

First I attached it to the FreeNAS server using this equipment:
http://ca.startech.com/HDD/Adapters/USB-20-to-IDE-or-SATA-Adapter-Cable~USB2SATAIDE

Second, I checked to see if it was "detected":
Code:
smartctl --scan
/dev/da1 -d usbjmicron # /dev/da1 [USB JMicron], ATA device


Third, I tried running badblocks:
Code:
badblocks -wvs /dev/da1
Checking for bad blocks in read-write mode
From block 0 to 2930266583
Testing with pattern 0xaa: 0 0.00% done, 0:07 elapsed. (0/0/0 errors)
1 0.00% done, 0:10 elapsed. (0/1/0 errors)
2 0.00% done, 0:13 elapsed. (0/2/0 errors)
3 0.00% done, 0:16 elapsed. (0/3/0 errors)
4 0.00% done, 0:20 elapsed. (0/4/0 errors)
5 0.00% done, 0:23 elapsed. (0/5/0 errors)
6 0.00% done, 0:26 elapsed. (0/6/0 errors)
7 0.00% done, 0:29 elapsed. (0/7/0 errors)
8 0.00% done, 0:33 elapsed. (0/8/0 errors)
9 0.00% done, 0:36 elapsed. (0/9/0 errors)


...every block returns a write error...
So I interrupted it and suspected it was mounted in read only...
Code:
mount -w /dev/da1
mount: /dev/da1: unknown special file or file system


At this point, I'm not sure what to do :(

Please advise!

I don't care about the data... I just want to test if the drive is good or not.

In the meantime I started a:
Code:
smartctl --test=long /dev/da1
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
badblocks won't work if the disk is mounted, so that isn't your problem. The USB-SATA may be your problem, or the drive could just be bad. Do you have any way of doing a direct SATA connection?
 

BlazeStar

Patron
Joined
Apr 6, 2014
Messages
383
yes of course I could open the server and install it as a "real" SATA !

I didn't think it'd be an issue (to connect it with USB)

I'll do that and report back :)

Thanks!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I don't know if the USB/SATA adapter is the problem or not, but it seems they often cause problems, and eliminating a variable is a good idea. Right now, the badblocks output is making it look like the drive is bad. What's the result of the SMART test?
 

BlazeStar

Patron
Joined
Apr 6, 2014
Messages
383
Revisiting this thread again, now that I've put the drive inside of the server, plugged in with SATA.

I ran an extended test and this is the output:
Code:
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:    Seagate Barracuda 7200.14 (AF)
Device Model:    ST3000DM001-9YN166
Serial Number:    W1F0SVLA
LU WWN Device Id: 5 000c50 052a45963
Firmware Version: CC4B
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:    512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Jun 13 16:23:41 2014 EDT
 
==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en
 
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121)    The previous self-test completed having
                    the read element of the test failed.
Total time to complete Offline
data collection:        (  584) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  1) minutes.
Extended self-test routine
recommended polling time:      ( 342) minutes.
Conveyance self-test routine
recommended polling time:      (  2) minutes.
SCT capabilities:            (0x3085)    SCT Status supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  090  080  006    Pre-fail  Always      -      58779786
  3 Spin_Up_Time            0x0003  093  092  000    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      41
  5 Reallocated_Sector_Ct  0x0033  099  099  036    Pre-fail  Always      -      2224
  7 Seek_Error_Rate        0x000f  082  060  030    Pre-fail  Always      -      172591029
  9 Power_On_Hours          0x0032  083  083  000    Old_age  Always      -      14915
10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      41
183 Runtime_Bad_Block      0x0032  100  100  000    Old_age  Always      -      0
184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0
187 Reported_Uncorrect      0x0032  001  001  000    Old_age  Always      -      37700
188 Command_Timeout        0x0032  100  100  000    Old_age  Always      -      0 0 2
189 High_Fly_Writes        0x003a  099  099  000    Old_age  Always      -      1
190 Airflow_Temperature_Cel 0x0022  063  058  045    Old_age  Always      -      37 (Min/Max 36/37)
191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      31
193 Load_Cycle_Count        0x0032  100  100  000    Old_age  Always      -      68
194 Temperature_Celsius    0x0022  037  042  000    Old_age  Always      -      37 (0 21 0 0 0)
197 Current_Pending_Sector  0x0012  099  001  000    Old_age  Always      -      200
198 Offline_Uncorrectable  0x0010  099  001  000    Old_age  Offline      -      200
199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0
240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      14517h+51m+20.042s
241 Total_LBAs_Written      0x0000  100  253  000    Old_age  Offline      -      4426123131666
242 Total_LBAs_Read        0x0000  100  253  000    Old_age  Offline      -      258434767002645
 
SMART Error Log Version: 1
ATA Error Count: 37700 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
 
Error 37700 occurred at disk power-on lifetime: 14734 hours (613 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00  2d+02:29:30.827  READ DMA EXT
  27 00 00 00 00 00 e0 00  2d+02:29:30.826  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  2d+02:29:30.818  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  2d+02:29:30.810  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  2d+02:29:30.786  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
 
Error 37699 occurred at disk power-on lifetime: 14734 hours (613 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00  2d+02:29:27.951  READ DMA EXT
  27 00 00 00 00 00 e0 00  2d+02:29:27.950  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  2d+02:29:27.942  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  2d+02:29:27.935  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  2d+02:29:27.910  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
 
Error 37698 occurred at disk power-on lifetime: 14734 hours (613 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00  2d+02:29:25.066  READ DMA EXT
  27 00 00 00 00 00 e0 00  2d+02:29:25.065  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  2d+02:29:25.057  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  2d+02:29:25.051  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  2d+02:29:25.033  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
 
Error 37697 occurred at disk power-on lifetime: 14734 hours (613 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00  2d+02:29:22.129  READ DMA EXT
  27 00 00 00 00 00 e0 00  2d+02:29:22.129  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  2d+02:29:22.121  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  2d+02:29:22.099  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  2d+02:29:22.001  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
 
Error 37696 occurred at disk power-on lifetime: 14734 hours (613 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 08 ff ff ff ef 00  2d+02:29:19.149  READ DMA EXT
  27 00 00 00 00 00 e0 00  2d+02:29:19.148  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  2d+02:29:19.140  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  2d+02:29:19.133  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  2d+02:29:19.112  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure      90%    14912        -
# 2  Extended offline    Completed: read failure      90%    14909        -
# 3  Extended offline    Completed: read failure      90%    14905        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


I keep getting those "read failure" statuses.

I would like to know if you guys would now consider de drive as being bad?

Thanks!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Your drive has reported nearly 38,000 errors in its lifetime. In addition, it has 200 offline uncorrectable sectors. I'd consider it bad.
 

BlazeStar

Patron
Joined
Apr 6, 2014
Messages
383
Thanks danb35.

I'm still rather clueless in how to interpret the results.

I'll throw this bad boy away.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Here's a crash course for Seagate drives (WD drives are easier to read without a calculator):

Parameters 1 and 7 should be 0, but they're represented in an inverse log scale, so insanely large numbers are good.

Parameters 5, 187, 197, 198 and 184 should all be 0. The first four indicate you're getting bad sectors, the last one indicates the drive's electronics are crapping out.
I'd tolerate less than 5 reallocated sectors if they were cleanly reallocated (197 and 198 both 0) and the number wasn't growing. 2000+ bad sectors is trash.
 
Status
Not open for further replies.
Top