HELP ZFS Pool data recovery

Davvo · Jun 20, 2023

zpool clear to reset any error count, then run another scrub.

This is necromancy.

Dawson · Jun 20, 2023

Will that make it go to "healthy" tho?

HoneyBadger · Jun 20, 2023

Dawson said:
So scrub finished. It still shows unhealthy, and it has 142 errors. How do I fix?

What does zpool status -v show? Does it show "permanent errors in the following files"?

Dawson · Jun 20, 2023

HoneyBadger said:
What does zpool status -v show? Does it show "permanent errors in the following files"?

Code:

root@truenas[~]# zpool status -v
  pool: Tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 01:16:52 with 142 errors on Tue Jun 20 13:12:18 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        Tank                                            ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/e0b02732-0f6c-11ee-afa2-afa1770be631  ONLINE       0     0     0
            gptid/11bac542-ad95-11ed-8d1c-7df9cea98351  ONLINE       0     0     0
            gptid/11c0215d-ad95-11ed-8d1c-7df9cea98351  ONLINE       0     0     0
        logs
          gptid/111fa2ca-ad95-11ed-8d1c-7df9cea98351    ONLINE       0     0     0
        cache
          gptid/111d9ded-ad95-11ed-8d1c-7df9cea98351    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        Tank/Data/windowsfiles@auto-2023-05-31_00-00:<0x1>

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:04 with 0 errors on Tue Jun 20 03:45:04 2023
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors
root@truenas[~]#

Davvo · Jun 20, 2023

Looks like a snapshot.

HoneyBadger · Jun 20, 2023

That it is, specifically metadata describing or contained within the snapshot Tank/Data/windowsfiles@auto-2023-05-31_00-00

Attempt a zpool clear Tank and then run the scrub again, if the issue persists then you may need to delete the snapshot in question - although in your situation, I'd still update your backups first.

Dawson · Jun 20, 2023

HoneyBadger said:
That it is, specifically metadata describing or contained within the snapshot Tank/Data/windowsfiles@auto-2023-05-31_00-00

Attempt a zpool clear Tank and then run the scrub again, if the issue persists then you may need to delete the snapshot in question - although in your situation, I'd still update your backups first.

I'm already updating them right now :). I'll do another scrub once it's done backing up.

Dawson · Jun 20, 2023

Once my unraid card comes in I'm going to redo this whole thing, so if it never gets "healthy" it's not the end of the world. I'm gonna do raidz2 this time lol

HoneyBadger · Jun 20, 2023

Dawson said:
my unraid card

I assume you mean an HBA?

And you do have a second drive now, so a four-drive Z2 is certainly an option.

Dawson · Jun 20, 2023

HoneyBadger said:
I assume you mean an HBA?

And you do have a second drive now, so a four-drive Z2 is certainly an option.

Yeah, that's what I meant. I will actually have 5 drives in total.

HoneyBadger · Jun 20, 2023

Dawson said:
Yeah, that's what I meant. I will actually have 5 drives in total.

3+2 RAIDZ2 would be even better, ~12T usable and two-drive redundancy.

I would question if you're getting any value out of the cache and log devices - if you have a very small amount of RAM, the cache device isn't likely to be able to fill itself with particularly good candidate data, and unless you're making synchronous writes (eg: NFS) against the data, the log will be idle.

Dawson · Jun 20, 2023

HoneyBadger said:
3+2 RAIDZ2 would be even better, ~12T usable and two-drive redundancy.

I would question if you're getting any value out of the cache and log devices - if you have a very small amount of RAM, the cache device isn't likely to be able to fill itself with particularly good candidate data, and unless you're making synchronous writes (eg: NFS) against the data, the log will be idle.

Yeah, I was going to do away with that. I will probably buy more ram down the road. But I've spent enough money this past week.

HoneyBadger · Jun 20, 2023

It's hard to go wrong with "more RAM" when it comes to ZFS!

Do keep us posted on the status of the clear/scrub, and once you've got updated backups, we're more than happy to help with the PCIe passthrough setup of the new HBA to get your TrueNAS install nice and solid.

Dawson · Jun 20, 2023

Will do. Thank you! I ended up going with "LSI 6Gbps SAS HBA 9200-8i IT Mode" Is that a decent one to go with? I heard a lot of people using them for truenas, and being that I don't want to spend much I thought it looked like a good option.

joeschmuck · Jun 20, 2023

I have to say that @HoneyBadger and @jgreco did one hell of a job here and the data was not lost forever. Very impressed and pleased that the forum has people here with the correct technical skill to pull this off. I know not all data can be saved but even if one can be saved, it's a good day.

Dawson · Jun 20, 2023

So redid the scrub, same result, I deleted all the snapshots, and redid it again here's the latest zpool status -v

Code:

root@truenas[~]# zpool status -v
  pool: Tank
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 01:13:24 with 72 errors on Tue Jun 20 17:54:06 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        Tank                                            DEGRADED     0     0     0
          raidz1-0                                      DEGRADED     0     0     0
            gptid/e0b02732-0f6c-11ee-afa2-afa1770be631  DEGRADED     0     0   288  too many errors
            gptid/11bac542-ad95-11ed-8d1c-7df9cea98351  DEGRADED     0     0   284  too many errors
            gptid/11c0215d-ad95-11ed-8d1c-7df9cea98351  DEGRADED     0     0   280  too many errors
        logs
          gptid/111fa2ca-ad95-11ed-8d1c-7df9cea98351    ONLINE       0     0     0
        cache
          gptid/111d9ded-ad95-11ed-8d1c-7df9cea98351    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        Tank/Data/windowsfiles:<0x1>

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:04 with 0 errors on Tue Jun 20 03:45:04 2023
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors
root@truenas[~]#

Davvo · Jun 20, 2023

Dawson said:
So redid the scrub, same result, I deleted all the snapshots, and redid it again here's the latest zpool status -v

Did you run zpool clear Tank before one of those steps?

Dawson · Jun 20, 2023

Davvo said:
Did you run zpool clear Tank before one of those steps?

I didn't forget the first time, But I think I did after I deleted the snapshots and ran the scrub again. I just cleared it and well see what this scrub turns up. Thanks for reminding me

Dawson · Jun 21, 2023

Here's the latest:

Code:

root@truenas[~]# zpool status -v
  pool: Tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 01:13:19 with 72 errors on Tue Jun 20 23:27:51 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        Tank                                            ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/e0b02732-0f6c-11ee-afa2-afa1770be631  ONLINE       0     0   144
            gptid/11bac542-ad95-11ed-8d1c-7df9cea98351  ONLINE       0     0   142
            gptid/11c0215d-ad95-11ed-8d1c-7df9cea98351  ONLINE       0     0   140
        logs
          gptid/111fa2ca-ad95-11ed-8d1c-7df9cea98351    ONLINE       0     0     0
        cache
          gptid/111d9ded-ad95-11ed-8d1c-7df9cea98351    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        Tank/Data/windowsfiles:<0x1>

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:04 with 0 errors on Tue Jun 20 03:45:04 2023
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors
root@truenas[~]#

In the ui it still says unhealthy.

winnielinnie · Jun 21, 2023

Dawson said:
In the ui it still says unhealthy.

The GUI is probably parsing the output of zpool status -v, and if it hits any instance of the "Permanent errors" or a non-zero number in the READ/WRITE/CKUM column, it interprets this as "UNHEALTHY". (Which is what you see on the GUI's Dashboard or Pools page.)

Just my guess.

Not sure what the explanation is for seeing ~140 checksum errors for all 3 drives.

The permanent error for Tank/Data/windowsfiles:<0x1> suggests metadata corruption for the dataset itself. (Not any specific file.)

Important Announcement for the TrueNAS Community.

HELP ZFS Pool data recovery

MVP

Explorer

actually does care

Explorer

MVP

actually does care

Explorer

Explorer

actually does care

Explorer

actually does care

Explorer

actually does care

Explorer

Old Man

Explorer

MVP

Explorer

Explorer

MVP

Similar threads