Help: Data Corruption, now what??

Status
Not open for further replies.

esamett

Patron
Joined
May 28, 2011
Messages
345
I had some loose cabling after I "borrowed" my CPU to diagnose a friend's computer failure. This generated a daily output message of "Degradation." [Daily Output 1 - Degradation]

My Freenas server was not shut down as I had thought when I fixed the cabling. Now I have a "Data Corruption error" [Daily Output 2 - Corruption]

I did a scrub. and the GUI says that I am on line. CLI: ZPOOL STATUS gives an error message. What else do I need to do? How do I identify corrupted files and remove error messages?

Thank you.
 

Attachments

  • Daily Output 1 - degradation.txt
    3.4 KB · Views: 258
  • Daily Output 2 - corruption.txt
    3.6 KB · Views: 217

William Grzybowski

Wizard
iXsystems
Joined
May 27, 2011
Messages
1,754
Do a zpool clear pool, then zpool scrub pool
If everything looks fine after that you're good to go
 

esamett

Patron
Joined
May 28, 2011
Messages
345
I did several cycles of scrub/clear and still have this message in daily output:

Checking status of zfs pools:
pool: video
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: scrub in progress for 4h17m, 36.98% done, 7h18m to go
config:

NAME STATE READ WRITE CKSUM
video ONLINE 0 0 0
raidz2 ONLINE 0 0 0
gpt/disk0 ONLINE 0 0 0
gpt/disk1 ONLINE 0 0 0
gpt/disk2 ONLINE 0 0 0
gpt/disk3 ONLINE 0 0 0
gpt/disk4 ONLINE 0 0 0
gpt/disk5 ONLINE 0 0 0
gpt/disk6 ONLINE 0 0 0
gpt/disk7 ONLINE 0 0 0
gptid/22b175d2-8e5a-11e0-b823-0021978fb278 ONLINE 0 0 15 15K repaired
gpt/disk9 ONLINE 0 0 9 9K repaired

After scrub finished the "repaired numbers" increased to 42K and 33K resptectively.

What is going on and what do I need to do?
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
This is what I might try. Since you have raidz2, you can loose 2 disks and still be safe.

Lookup the serial numbers of the drives in the GUI.
Shut down remove ONE of the disks that match the serial # you got above.
Run some surface scan on it with a good disk tool, see if it passes.
Wipe it completely.
Put it back in
Boot up and tell the GUI to replace it, let it resilver.
See if there are still errors reported for that disk.
If not, repeat and do the same for the other.
You may need to think about replacing 1 or 2 disks, but don't use the array until all the disks are in it and it's resilvered. In theory that should be ok, but if it were me I wouldn't change anything until you eliminate the problem.

If you really want to be safe and have more time/disks etc., back it up first, but you have Z2, so if you are careful you should be fine. You also might try some different cables and do another scrub first....
 
Status
Not open for further replies.
Top