Is this as bad as I think? (SMART Raw_Read_Error_Rate & Hardware_ECC_Recovered)

Status
Not open for further replies.

nfriedly

Cadet
Joined
Jan 15, 2016
Messages
5
Hi, I just set up my first freeNAS today, and I was having trouble with it - I couldn't actually mount the CIFS share, and I also couldn't turn on the SMART service. Some fooling around on the command line and I learned that I have to include -d sat with my smartctl commands. After that, I was able to get this output from a short test:

[root@freenas] ~# smartctl -d sat -A /dev/da0
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p28 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 066 065 006 Pre-fail Always - 4036414
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 3
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 253 045 Pre-fail Always - 41272
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 3
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 1
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 062 062 040 Old_age Always - 38 (Min/Max 16/38)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 7
194 Temperature_Celsius 0x0022 038 040 000 Old_age Always - 38 (0 16 0 0 0)
195 Hardware_ECC_Recovered 0x001a 021 020 000 Old_age Always - 4036414
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 2 (142 95 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1092944
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 2943470


This is a brand new Seagate NAS HDD ST6000VN0021 connected via one of these: http://www.tripplite.com/products/model-print/mid/5652 (which is presumably why I need to specify the device type and perhaps also the source of the problems with my SMART service.)

The other two drives are WD Reds that report 0's for the Raw_Read_Error_Rate and don't have a line for Hardware_ECC_Recovered.

So, my main question is: is that Seagate drive bad? Should I just send it back and ask for a replacement?

I'm also curious about the SMART service and getting it to work with my USB drive bays. Anyone have any advice there? (The USB bays are because my "server" is actually a laptop that I had sitting around with a ton of RAM in it.)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The drive's not doing well. Perform a SMART long test and let it fail, then RMA the sorry thing.

Also, don't use "USB drive bays." Sorry to be blunt, but your "server" being a laptop doesn't actually mean the laptop is somehow actually able to be a good server. It lacks most of the characteristics that make for a stable server.

If you don't care about your data, feel free to charge onward. If you are coming to FreeNAS and ZFS to protect and store your data, find something more suited to server duty. Like a server motherboard and a chassis that can hold your drives.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
There is nothing wrong with that drive. ID1 is more often than not meaningless information and ID 195 is vendor specific and just like ID1, meaningless in most cases, including this one. Look at my tagline for Decoding your SMART Data for more info.

FreeNAS is not designed to use USB as an interface for storage. As @jgreco said, don't use it that way or you are basically on your own.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
There's a significant seek error rate going on, though.
 

nfriedly

Cadet
Joined
Jan 15, 2016
Messages
5
Yea, I know this isn't the ideal hardware for FreeNAS, but it's what was available. This system will be used for a home media & backup server, and any really important data is already backed up elsewhere, so I can't really justify spending significant extra money on a server motherboard and chassis (and CPU, and ECC RAM, and PSU, etc...)

I'll do a longer burn-in and report back in a day or two (not sure how long it will take for 6tb drives...)

FWIW, several of the SMART values have already increased after just doing the conveyance test:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 067 065 006 Pre-fail Always - 5222582
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 3
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 253 045 Pre-fail Always - 108639
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 5
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 1
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 062 062 040 Old_age Always - 38 (Min/Max 16/38)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 7
194 Temperature_Celsius 0x0022 038 040 000 Old_age Always - 38 (0 16 0 0 0)
195 Hardware_ECC_Recovered 0x001a 022 020 000 Old_age Always - 5222582
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 4 (172 168 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 2279112
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 2943470
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
There's a significant seek error rate going on, though.
On it's own I doubt it means anything other than you might take a performance hit since the drive is either overshooting it's target or undershooting it's target.

FWIW, several of the SMART values have already increased after just doing the conveyance test:
This doesn't surprise me at all and they will continue to change. When you have a few counts in IDs 5, 197, or 198, then you do have a mechanical failure. Run the long test and if you have some value in one of those IDs, then I'd RMA it. That's just my advice.
 

nfriedly

Cadet
Joined
Jan 15, 2016
Messages
5
Hi folks, to give a final update, badblocks finished after ~900 hours and reported no problems found. I then popped the drive into a windows machine and ran Sea Tools to check the SMART stats, and it gave the drive a clean bill of health. So it sounds like I have a good drive after all.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
900 hours of testing, that is a bit of testing, maybe a bit excessive for my tastes but it's good to know that the drive is solid.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Hi folks, to give a final update, badblocks finished after ~900 hours and reported no problems found. I then popped the drive into a windows machine and ran Sea Tools to check the SMART stats, and it gave the drive a clean bill of health. So it sounds like I have a good drive after all.
How many runs? That sounds absolutely crazy. Four passes takes 3-4 days on 3TB WD Reds.
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
This is a Seagate.. Looks fine.. Mine even show up with uncorrectable pending and then the sectors (read correctly I presume) disappears ;) Thats a good amount of badblocks.. Safe to use I would say.. Seagate reports seek and hardware ECC (I think) differently then WD.. I don't have ECC on mine I guess desktop is a little simpler..

Code:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   114   099   006    Pre-fail  Always       -       79449688
  3 Spin_Up_Time            0x0003   096   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       13
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   072   060   030    Pre-fail  Always       -       19441945
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1319
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       13
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   061   045    Old_age   Always       -       34 (Min/Max 20/39)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       8
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       39
194 Temperature_Celsius     0x0022   034   040   000    Old_age   Always       -       34 (0 20 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       1318h+51m+20.839s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       6306575940
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       169473562582
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
@Yatti420 You drive shouldn't show ID's 5, 197, and 198 with values other than zero otherwise you likely have a failing drive. If you do have these values and then run badblocks and the values go away, well that is odd or maybe I misunderstood what you were saying. Of course ID 1 is non-sense for us and likely goes hand in hand with ID 7. Both are of no consequence other than the heads may be under or over shooting the target and then have to correct for it. Again, nothing to worry about.
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
@Yatti420 You drive shouldn't show ID's 5, 197, and 198 with values other than zero otherwise you likely have a failing drive. If you do have these values and then run badblocks and the values go away, well that is odd or maybe I misunderstood what you were saying. Of course ID 1 is non-sense for us and likely goes hand in hand with ID 7. Both are of no consequence other than the heads may be under or over shooting the target and then have to correct for it. Again, nothing to worry about.

Exactly as you say I had pending uncorrectable (read errors?) on the drive.. So when they read correctly (eventually) my understanding is that the value goes back to 0... You would think that the sector would be remapped but not according to the other smart attributes.. I had it happen to more then one single drive.. I do have a recapped PSU so power issues can be a factor.. But voltages are well within spec.. Nothing ever showed in actual #5 reallocated.. I think even a drive I pulled previously had the same value in 198 offline uncorrectable as well and then disappeared after badblocks and some smart tests.. I just went back through the box it's more then one drive.. It's weird to say the least..

I was thinking they were having their first hiccup or something hence why they are pulled.. but with a box of seemingly good drives it's odd..
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
If you run a long SMART test and all looks good, well hopefully it stays that way. Either way just ensure you have a backup of your important data just in case you have a real problem (I'm sure you have heard that advice and likely given it yourself). I can't speak to the PSU unless you have an O'scope so you can really see if there is any noise on the power lines but I would expect you to have periodic crashes of your system if you had bad power. This is one of the reasons for running MemTest86 for a long period of time, it can help sort through some of the electrical issues, but not surges when a drive spins up. Again, likely not the PSU.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
So when they read correctly (eventually) my understanding is that the value goes back to 0... You would think that the sector would be remapped but not according to the other smart attributes.
Pending can revert to zero after a successful read or write. The latter can mask a problem, since the sector may fail to read again in the future. As far as I know, only a failed write leads to a remap.
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Makes sense.. Thats what I believe is happening..
 
Status
Not open for further replies.
Top