Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

Smartctl Is my drive broken?

Joined
Sep 9, 2019
Messages
3
Thanks
0
#1
Hello

I am doing smartctl -a on my drive and getting the following results.

The weird thing is one partition I have does not mount but the other one mounts fine. That makes me wonder whether there is a data corruption issue on just one partition or whether infact the whole drive is broken.

Thank you for your advice.


Code:
root@myhost:/home/user1# smartctl -a /dev/sdb
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-5.0.0-13-generic] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST1000LM035-1RK172
Serial Number:    WDE4S5YD
LU WWN Device Id: 5 000c50 0a855e37a
Firmware Version: ACM1
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      2.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Sep  9 03:34:09 2019 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 121)    The previous self-test completed having
                    the read element of the test failed.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x71) SMART execute Offline immediate.
                    No Auto Offline data collection support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 166) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x3035)    SCT Status supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   080   052   006    Pre-fail  Always       -       107767397
  3 Spin_Up_Time            0x0003   099   099   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   037   037   020    Old_age   Always       -       65535
  5 Reallocated_Sector_Ct   0x0033   097   097   036    Pre-fail  Always       -       1672
  7 Seek_Error_Rate         0x000f   082   060   045    Pre-fail  Always       -       4453874815
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       5653 (173 92 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       691
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       28429
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   055   040    Old_age   Always       -       31 (Min/Max 20/31)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       92
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       98
193 Load_Cycle_Count        0x0032   042   042   000    Old_age   Always       -       116839
194 Temperature_Celsius     0x0022   031   045   000    Old_age   Always       -       31 (0 11 0 0 0)
197 Current_Pending_Sector  0x0012   096   095   000    Old_age   Always       -       368
198 Offline_Uncorrectable   0x0010   096   095   000    Old_age   Offline      -       368
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       5460 (18 67 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       10332521622
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       13791968337
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 28431 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 28431 occurred at disk power-on lifetime: 5444 hours (226 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   3d+20:04:14.516  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00   3d+20:04:14.505  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00   3d+20:04:14.479  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00   3d+20:04:14.477  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00   3d+20:04:14.465  SET FEATURES [Set transfer mode]

Error 28430 occurred at disk power-on lifetime: 5444 hours (226 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   3d+20:04:14.260  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00   3d+20:04:14.259  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00   3d+20:04:14.259  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00   3d+20:04:14.259  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00   3d+20:04:14.259  READ FPDMA QUEUED

Error 28429 occurred at disk power-on lifetime: 5444 hours (226 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   3d+20:04:14.031  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00   3d+20:04:14.021  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00   3d+20:04:13.995  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00   3d+20:04:13.994  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00   3d+20:04:13.981  SET FEATURES [Set transfer mode]

Error 28428 occurred at disk power-on lifetime: 5444 hours (226 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   3d+20:04:13.761  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00   3d+20:04:13.721  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00   3d+20:04:13.721  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00   3d+20:04:13.721  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00   3d+20:04:13.721  READ FPDMA QUEUED

Error 28427 occurred at disk power-on lifetime: 5444 hours (226 days + 20 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00   3d+20:04:13.521  READ FPDMA QUEUED
  ef 10 02 00 00 00 a0 00   3d+20:04:13.511  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 00   3d+20:04:13.484  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00   3d+20:04:13.483  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00   3d+20:04:13.470  SET FEATURES [Set transfer mode]

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%      5653         291803192
# 2  Short offline       Completed: read failure       90%      5653         291803192
# 3  Extended offline    Completed: read failure       90%      5652         291803192

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Joined
Sep 9, 2019
Messages
3
Thanks
0
#2
I apologize I forgot to ask for your advice as to whether I can recover some of the data off the drive. I cannot mount the partition in question and fsck is failing with the following error:

Code:
# fsck.ext4 -v /dev/mapper/backup1
e2fsck 1.44.6 (5-Mar-2019)
Error reading block 10,3841,791 (Invalid argument).  Ignore error<y>? yes
Force rewrite<y>? yes
Superblock has an invalid journal (inode 8).
Clear<y>? yes
*** journal has been deleted ***

The filesystem size (according to the superblock) is 207,718,400 blocks
The physical size of the device is 72,154,173 blocks
Either the superblock or the partition table is likely to be corrupt!
Abort<y>? yes
Error writing block 103,841,791 (Invalid argument).  Ignore error<y>? yes

/dev/mapper/backup1: ***** FILE SYSTEM WAS MODIFIED *****
 
Joined
Oct 18, 2018
Messages
433
Thanks
206
#3
The weird thing is one partition I have does not mount but the other one mounts fine.
How are you trying to mount the drive? Are you using the FreeNAS GUI?

5 Reallocated_Sector_Ct 0x0033 097 097 036 Pre-fail Always - 1672
197 Current_Pending_Sector 0x0012 096 095 000 Old_age Always - 368
198 Offline_Uncorrectable 0x0010 096 095 000 Old_age Offline - 368
Your drive is not in great shape. What is your pool layout via zpool status
 
Joined
Sep 9, 2019
Messages
3
Thanks
0
#4
How are you trying to mount the drive? Are you using the FreeNAS GUI?
Your drive is not in great shape. What is your pool layout via zpool status
Thanks for your reply.

My drive is ext4 I am not using ZFS. I am mounting using the linux command line (sudo mount /dev/sda1)

Here are my questions about the above. Id really appreciate any answer you have it really helps thank you!

  1. Current_Pending_Sector value is 368. Does that mean there are potentially 368 bad sectors on the drive?

  2. After correcting these 368 bad sectors can I then mount the drive?

  3. I will use dd to zero out the bad blocks/sectors, correct?

  4. Before that can I recover the data in these bad blocks? Just in case the data is critical data...

  5. From the above outputs, is any PHYSICAL problem with the disk? After fix all the bad sectors it is going to work again?

Thank you so so much. I've also added some extra things I noticed feel free to read them or not!!!



SMARTCTL -A :

  • (ATA Error Count=28431/Power_On_Hours=5653/Device Error Log First Reported Error occurring at 5444 hours).

SMART Test Log:

  • LBA_of_first_error value: 291803192.

Device Error Log:

  • UNC uncorrectable error event: 40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455

  • LBA 291803192 and LBA 268435455 are in partitions 2 and 3.

FDISK:

Code:
Device             Start              End           Sectors           Size     Type
/dev/sda1           2048           206,847            204,800           100M     Microsoft basic data
/dev/sda2         239,616         289,302,112     289,062,497     137.9G Microsoft basic data
/dev/sda3          291,743,744     869,009,895     577,266,152     275.3G Microsoft basic data

(sda3 is infact LUKS encrypted ext4 and not NTFS. sda2 is NTFS).
 
Joined
Oct 18, 2018
Messages
433
Thanks
206
#5
My drive is ext4 I am not using ZFS. I am mounting using the linux command line (sudo mount /dev/sda1)
Are you using FreeNAS? If not you'll probably find better help for your question on a more linux focused site? These forums are pretty focused on FreeNAS which is based on FreeBSD and uses zfs as the file system.
 
Joined
May 14, 2019
Messages
3
Thanks
1
#6
I’ve had good results with a Linux program called ‘ddrescue’ for getting data off of failing disks. Basically you need another disk of larger size, and ddrescue patiently reads all the data it can off the failing disk and makes an image of it. Then you do whatever you have to do to fix the directory structure of the image - fsck or similar. Then rsync data from the image to another disk. Google is your friend for the details.
 
Joined
Oct 18, 2018
Messages
433
Thanks
206
#7
I’ve had good results with a Linux program called ‘ddrescue’ for getting data off of failing disks. Basically you need another disk of larger size, and ddrescue patiently reads all the data it can off the failing disk and makes an image of it. Then you do whatever you have to do to fix the directory structure of the image - fsck or similar. Then rsync data from the image to another disk. Google is your friend for the details.
Have you tried this on a zfs disk? And is this different than just using dd to clone the disk? In general I would be wary of trying to perform data recovery in this way with FreeNAS for the simple reason that FreeNAS uses zfs with vdevs that parity built in to manage your data and many vdev types are built such that recovery of data off a single disk is impossible. Done properly zfs allows you to keep your data safe even in the event that an entire drive becomes completely inoperable. If you use zfs in this way why bother using any tool to mirror the drive to another drive other than the built-in zfs resilvering functionality?
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,067
Thanks
1,040
#8
Your drive has several read errors, hundreds of pending and reallocated sectors and is failing smart tests. It's toast.
 
Joined
May 14, 2019
Messages
3
Thanks
1
#9
@PhiloEpisteme : sorry, I should have provided more context... it appears that the original poster is not using zfs, so I was just suggesting an appropriate tool for his failing disk with a linux filesystem on it. I wasn't suggesting that it is appropriate for a failing zfs disk, especially if you have redundancy in your pool - you're right, better to just use the built in functionality of zfs to repair your data. I suppose in some extreme circumstance one could use ddrescue in a zfs scenario (single disk pool failing), but that's not what the original poster is dealing with. I think he googled SMART and landed on our forum since our members discuss and use it.
 
Joined
Oct 18, 2018
Messages
433
Thanks
206
#10
I think he googled SMART and landed on our forum since our members discuss and use it.
You're probably right. Thanks for the context of your post, that makes a lot of sense. I just chimed in as I did to discourage newer, less experienced users of FreeNAS to be encouraged to use something outside of zfs to manage the integrity of their pools and disks. As you rightly pointed out above though, this specific circumstance isn't using zfs and so your suggest is much more appropriate than my initial reply suggested.
 
Top