Dying hard disk

bobbbbbbb

Cadet
Joined
Feb 24, 2019
Messages
6
Hi everyone....am i correct to say that my hard disk is busy dying and should be replaced ASAP,

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 114 091 006 Pre-fail Always - 73471080
3 Spin_Up_Time 0x0003 096 090 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 043 043 020 Old_age Always - 59068
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 720
7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 133761621934
9 Power_On_Hours 0x0032 080 080 000 Old_age Always - 18183
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 043 043 020 Old_age Always - 58995
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 1800
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 17180131342
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 062 044 045 Old_age Always In_the_past 38 (0 2 38 34 0)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 045 045 000 Old_age Always - 110805
193 Load_Cycle_Count 0x0032 041 041 000 Old_age Always - 119956
194 Temperature_Celsius 0x0022 038 056 000 Old_age Always - 38 (0 16 0 0 0)
195 Hardware_ECC_Recovered 0x001a 114 099 000 Old_age Always - 73471080
197 Current_Pending_Sector 0x0012 068 001 000 Old_age Always - 10536
198 Offline_Uncorrectable 0x0010 068 001 000 Old_age Offline - 10536
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 1
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 15423 (179 108 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 125410769802
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 1521002467734
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
If you would put your listing inside code tags like this [code] your text [/code] it would display in a more readable way.
Like this:
Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   081   063   044    Pre-fail  Always       -       166128133
  3 Spin_Up_Time            0x0003   094   094   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       6
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   084   060   045    Pre-fail  Always       -       315392430
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       3644
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       6
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   058   054   040    Old_age   Always       -       42 (Min/Max 37/45)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       5
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       252
194 Temperature_Celsius     0x0022   042   046   000    Old_age   Always       -       42 (0 25 0 0 0)
195 Hardware_ECC_Recovered  0x001a   054   006   000    Old_age   Always       -       166128133
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      3630         -
# 2  Short offline       Completed without error       00%      3629         -
# 3  Short offline       Completed without error       00%      3606         -
# 4  Short offline       Completed without error       00%      3605         -
# 5  Extended offline    Completed without error       00%      3573         -
# 6  Short offline       Completed without error       00%      3558         -
# 7  Short offline       Completed without error       00%      3557         -
# 8  Extended offline    Completed without error       00%      3525         -
# 9  Short offline       Completed without error       00%      3510         -
#10  Short offline       Completed without error       00%      3509         -
#11  Short offline       Completed without error       00%      3486         -
#12  Short offline       Completed without error       00%      3485         -
#13  Short offline       Completed without error       00%      3462         -
#14  Short offline       Completed without error       00%      3461         -
#15  Short offline       Completed without error       00%      3438         -
#16  Short offline       Completed without error       00%      3437         -
#17  Extended offline    Completed without error       00%      3405         -
#18  Short offline       Completed without error       00%      3390         -
#19  Short offline       Completed without error       00%      3389         -
#20  Extended offline    Completed without error       00%      3357         -
#21  Short offline       Completed without error       00%      3342         -

Have you been running periodic SMART tests? How do those results look?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
After a bit of scrutiny, I would say it is definitely failing and these lines appear to be indicators of problems:
Code:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
5   Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       720
9   Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       18183
12  Power_Cycle_Count       0x0032   043   043   020    Old_age   Always       -       58995
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       1800
192 Power-Off_Retract_Count 0x0032   045   045   000    Old_age   Always       -       110805
193 Load_Cycle_Count        0x0032   041   041   000    Old_age   Always       -       119956
197 Current_Pending_Sector  0x0012   068   001   000    Old_age   Always       -       10536
198 Offline_Uncorrectable   0x0010   068   001   000    Old_age   Offline      -       10536
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       15423 (179 108 0)

Reallocated_Sector_Ct should always be zero (talking raw values) and if it is not, that is bad sectors for sure. Is it in warranty?

The Offline_Uncorrectable cound and Current_Pending_Sector count are both non zero. That indicates a problem for sure, but I wonder about the reason why.
I see the Power_On_Hours and Head_Flying_Hours being in the range of 15k to 18k while the Power_Cycle_Count is over 58k. The excessive power cycling can be a significant influence in early drive death. Based on the Power_On_Hours, this drive should still be a baby, but looking at the Power_Cycle_Count, I am not so sure. Every time the power is cycled, it causes unneeded stress on the drive. It may have lasted years longer without the on off cycles.

PS. The Load_Cycle_Count is over 100k. That is more than most drives are rated for. This looks like you tried to do some power management and it killed the drive prematurely.
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

bobbbbbbb

Cadet
Joined
Feb 24, 2019
Messages
6
The hard drive should still be under warranty, i had some power failures in the past few weeks which could explain the The Load_Cycle_Count is over 100k.
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
i had some power failures in the past few weeks which could explain the The Load_Cycle_Count is over 100k.

No, not really. No way a power failure or a brownout can produce 100K of Load Cycle Count. That looks definitely like a power management issue. Something like overly aggressive power management, causing an unload followed by near immediate wakeup.
 

bobbbbbbb

Cadet
Joined
Feb 24, 2019
Messages
6
ok i am not sure then.....but will take it back under warranty.
Hope they wont give me issues, but based on the SMART logs they should replace it.
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
18183 hours = 1,090,980 minutes... Load cycle count and power off counts are roughly the same... 1,090,980 / 110805 = 9.84 minutes. Something is powering this disk off every 10 minutes and immediately spinning it back up. I'll stick my neck out and say, it killed your disk.
 

bobbbbbbb

Cadet
Joined
Feb 24, 2019
Messages
6
18183 hours = 1,090,980 minutes... Load cycle count and power off counts are roughly the same... 1,090,980 / 110805 = 9.84 minutes. Something is powering this disk off every 10 minutes and immediately spinning it back up. I'll stick my neck out and say, it killed your disk.

thats very interesting.
any idea what it can be?

it is an 8TB, seagate "archive" drive.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I sure would like to see more information about the rest of your hardware because I would like to root out the cause of this premature hard drive failure before you suffer from more problems just like this.

Please review this guide and post as much as you know about what you have done to try and make your system / drives spin down.

Updated Forum Rules 12/5/18
https://forums.freenas.org/index.php?threads/updated-forum-rules-12-5-18.45124/
 

rvassar

Guru
Joined
May 2, 2018
Messages
972
thats very interesting.
any idea what it can be?

it is an 8TB, seagate "archive" drive.


No clue. Could be anything between a software setting, to a flakey power cable or power supply. It might even be the drive itself. Can you post the exact model number?

Also... "Archive" drives might imply SMR recording, which is a performance killer in a NAS application.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Also... "Archive" drives might imply SMR recording, which is a performance killer in a NAS application.
Yes, they are SMR. Seagate also intended them to have few writes and many reads which is generally not very useful.
any idea what it can be?
Usually, when a drive is spinning down and back up like this, it is because the person that configured the NAS tried to enable power management but didn't do all the things that are needed to get FreeNAS to quit accessing the storage pool for other things like swap and system dataset.
Ten minutes is a telling interval.
 

bobbbbbbb

Cadet
Joined
Feb 24, 2019
Messages
6
i am fairly new to FreeNAS, 757 days?!
that is a lot. it used to run as a pass though on an ubuntu VM.
maybe a proxmox setting.....
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
i am fairly new to FreeNAS, 757 days?!
that is a lot. it used to run as a pass though on an ubuntu VM.
maybe a proxmox setting.....
It could be that the wear happened when the drive was serving a different purpose. The default configuration for FreeNAS is no power management of the drives, so they spin all the time, which is actually better for the drive from the perspective of wear.
Some companies will void the warranty with numbers like this. Give it a try though.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
could it be a bios setting?
i am using it on an HP micro server GEN 8
Sorry. I am not familiar with that system, but I have seen power management configuration settings in other systems.
 
Top