Zpool Data Corruption Error...

bbddpp · Dec 27, 2013

Before I continue, my system is just a JBOD media server. Each physical drive inside is its own zpool so I can format them independently without causing any issues.

media5 = ada0 (one zfs pool per hard drive).

One of the drives inside seems to be failing, as evidenced by this zpool status:

Code:

pool: media5
state: ONLINE
status: One or more devices has experienced an error resulting in data
    corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
    entire pool from backup.
  see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 6h51m with 23 errors on Sun Dec  8 06:51:11 2013
config:
 
    NAME                                          STATE    READ WRITE CKSUM
    media5                                        ONLINE      1    0    0
      gptid/b9e48da6-6785-11e2-a0bd-001a4d704eb5  ONLINE      1    0    0
 
errors: 1 data errors, use '-v' for a list

And this smart test result:

Code:

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  117  099  006    Pre-fail  Always      -      131390024
  3 Spin_Up_Time            0x0003  092  091  000    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      29
  5 Reallocated_Sector_Ct  0x0033  100  100  010    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000f  062  060  030    Pre-fail  Always      -      12890550128
  9 Power_On_Hours          0x0032  092  092  000    Old_age  Always      -      7037
10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      29
183 Runtime_Bad_Block      0x0032  100  100  000    Old_age  Always      -      0
184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0
187 Reported_Uncorrect      0x0032  001  001  000    Old_age  Always      -      239
188 Command_Timeout        0x0032  100  099  000    Old_age  Always      -      4 4 7
189 High_Fly_Writes        0x003a  098  098  000    Old_age  Always      -      2
190 Airflow_Temperature_Cel 0x0022  062  060  045    Old_age  Always      -      38 (Min/Max 31/39)
191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      26
193 Load_Cycle_Count        0x0032  099  099  000    Old_age  Always      -      2921
194 Temperature_Celsius    0x0022  038  040  000    Old_age  Always      -      38 (0 18 0 0 0)
197 Current_Pending_Sector  0x0012  099  099  000    Old_age  Always      -      200
198 Offline_Uncorrectable  0x0010  099  099  000    Old_age  Offline      -      200
199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      1
240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      1859h+21m+11.747s
241 Total_LBAs_Written      0x0000  100  253  000    Old_age  Offline      -      21525060739
242 Total_LBAs_Read        0x0000  100  253  000    Old_age  Offline      -      34367754187

Are there any commands I can run to try and zero out/mark the bad sectors and continue to use this drive? It's a fairly new drive and a large one at that and I'd hate to have to trash it though I removed it from an external enclosure so there's no warranty to speak of, alas.

Any suggestions on what I can try would be most welcome, I'm a little new to this world of zfs pools and smart tests but a quick study.

I've relocated all the data on this drive to another one, so I'm up for anything and have nothing on the drive that's at risk right now. It's basically now empty.

Thanks to anyone who takes a few moments to help!

warri · Dec 27, 2013

200 Pending and 200 Offline Sectors plus a lot of read errors? I don't think mapping out sectors will help, the errors will just stack up again. Replace the disk.

cyberjock · Dec 27, 2013

Well, considering you already have 200 bad sectors, you'd have to do 200 long smart tests to figure out which sectors are bad. But, generally speaking once you get beyond an occasional one or two bad sectors it generally balloons out of control rapidly.

Your only feasible solution is to replace the disk. :(

bbddpp · Dec 27, 2013

Thanks for the replies. Wow, that's disappointing to have a 3TB disk fail in under a year. Is that normal? Not like this thing even saw heavy use, only as a media drive/server. Bummed.

Is it safe to use this drive for anything at this point or am I just taking chances with any data I put in there? So no sense totally destroying and re-creating the zpool on it I guess?

cyberjock · Dec 27, 2013

Its not normal but its not uncommon. Treat your hard drives like crap or they get tossed in shipping and they will have a shorter lifespan.

I'd just throw the drive away and never use it for anything else.

warri · Dec 27, 2013

It can happen, also depends a lot on the operating conditions like HDD temperature, etc. Maybe you can try to RMA it directly with the manufacturer?

I wouldn't trust any data on it.

bbddpp · Dec 27, 2013

Okay, thanks guys. It's possible the external enclosure it was living it wasn't temperature controlling it well enough. I'll have to keep an eye on other drives I put in there. Otherwise the drive was put in there and never moved at all, so it was taken pretty well care of, scrubbed regularly, etc.

I'll try an RMA though I know it's tough when you remove an external drive from a case and use it as an internal.

Sir.Robin · Dec 28, 2013

bbddpp said:
Thanks for the replies. Wow, that's disappointing to have a 3TB disk fail in under a year. Is that normal? Not like this thing even saw heavy use, only as a media drive/server. Bummed.

Is it safe to use this drive for anything at this point or am I just taking chances with any data I put in there? So no sense totally destroying and re-creating the zpool on it I guess?

I recently set up a SATA based disk array at work with 48x Seagate ES 4TB drives. At boot 1 disk failed. After two months of production, another one failed.
So... shit happens :)

bbddpp · Dec 29, 2013

Indeed it does. Thank goodness to smart and regular scrubs at least we can hopefully avoid total disaster in cases like mine where I don't have enough drives/space to have a good RAID setup.

Thanks all. Drive swapped, put it back into the case and sent out for RMA, hopefully they will cover it.

dztrbd · Jan 2, 2014

Did they RMA it?

Sent from my SM-N9005 using Tapatalk

bbddpp · Feb 3, 2014

Indeed they did! Took a while but I got a drive back without any fanfare...Good for Seagate. I really did use the drive under normal parameters and took care of it so I'm glad they did good and gave me a fresh 3TB.

Thanks a lot for your replies and concern. It's so nice to see a green light in FreeNAS again though I'm not sure what I'm gonna do when my enclosure runs out of drive bays.

Important Announcement for the TrueNAS Community.

Zpool Data Corruption Error...

bbddpp

Explorer

warri

Guru

cyberjock

Inactive Account

bbddpp

Explorer

cyberjock

Inactive Account

warri

Guru

bbddpp

Explorer

Sir.Robin

Guru

bbddpp

Explorer

dztrbd

Dabbler

bbddpp

Explorer

Similar threads

Important Announcement for the TrueNAS Community.

Zpool Data Corruption Error...

Explorer

Guru

Inactive Account

Explorer

Inactive Account

Guru

Explorer

Guru

Explorer

Dabbler

Explorer

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Zpool Data Corruption Error..."

Similar threads