HELP ZFS Pool data recovery

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
zpool clear to reset any error count, then run another scrub.

This is necromancy.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
So scrub finished. It still shows unhealthy, and it has 142 errors. How do I fix?
What does zpool status -v show? Does it show "permanent errors in the following files"?
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
What does zpool status -v show? Does it show "permanent errors in the following files"?

Code:
root@truenas[~]# zpool status -v
  pool: Tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 01:16:52 with 142 errors on Tue Jun 20 13:12:18 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        Tank                                            ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/e0b02732-0f6c-11ee-afa2-afa1770be631  ONLINE       0     0     0
            gptid/11bac542-ad95-11ed-8d1c-7df9cea98351  ONLINE       0     0     0
            gptid/11c0215d-ad95-11ed-8d1c-7df9cea98351  ONLINE       0     0     0
        logs
          gptid/111fa2ca-ad95-11ed-8d1c-7df9cea98351    ONLINE       0     0     0
        cache
          gptid/111d9ded-ad95-11ed-8d1c-7df9cea98351    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        Tank/Data/windowsfiles@auto-2023-05-31_00-00:<0x1>

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:04 with 0 errors on Tue Jun 20 03:45:04 2023
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors
root@truenas[~]#
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
That it is, specifically metadata describing or contained within the snapshot Tank/Data/windowsfiles@auto-2023-05-31_00-00

Attempt a zpool clear Tank and then run the scrub again, if the issue persists then you may need to delete the snapshot in question - although in your situation, I'd still update your backups first.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
That it is, specifically metadata describing or contained within the snapshot Tank/Data/windowsfiles@auto-2023-05-31_00-00

Attempt a zpool clear Tank and then run the scrub again, if the issue persists then you may need to delete the snapshot in question - although in your situation, I'd still update your backups first.

I'm already updating them right now :). I'll do another scrub once it's done backing up.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Once my unraid card comes in I'm going to redo this whole thing, so if it never gets "healthy" it's not the end of the world. I'm gonna do raidz2 this time lol
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Yeah, that's what I meant. I will actually have 5 drives in total.
3+2 RAIDZ2 would be even better, ~12T usable and two-drive redundancy.

I would question if you're getting any value out of the cache and log devices - if you have a very small amount of RAM, the cache device isn't likely to be able to fill itself with particularly good candidate data, and unless you're making synchronous writes (eg: NFS) against the data, the log will be idle.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
3+2 RAIDZ2 would be even better, ~12T usable and two-drive redundancy.

I would question if you're getting any value out of the cache and log devices - if you have a very small amount of RAM, the cache device isn't likely to be able to fill itself with particularly good candidate data, and unless you're making synchronous writes (eg: NFS) against the data, the log will be idle.
Yeah, I was going to do away with that. I will probably buy more ram down the road. But I've spent enough money this past week.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
It's hard to go wrong with "more RAM" when it comes to ZFS!

Do keep us posted on the status of the clear/scrub, and once you've got updated backups, we're more than happy to help with the PCIe passthrough setup of the new HBA to get your TrueNAS install nice and solid.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Will do. Thank you! I ended up going with "LSI 6Gbps SAS HBA 9200-8i IT Mode" Is that a decent one to go with? I heard a lot of people using them for truenas, and being that I don't want to spend much I thought it looked like a good option.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I have to say that @HoneyBadger and @jgreco did one hell of a job here and the data was not lost forever. Very impressed and pleased that the forum has people here with the correct technical skill to pull this off. I know not all data can be saved but even if one can be saved, it's a good day.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
So redid the scrub, same result, I deleted all the snapshots, and redid it again here's the latest zpool status -v

Code:
root@truenas[~]# zpool status -v
  pool: Tank
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 01:13:24 with 72 errors on Tue Jun 20 17:54:06 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        Tank                                            DEGRADED     0     0     0
          raidz1-0                                      DEGRADED     0     0     0
            gptid/e0b02732-0f6c-11ee-afa2-afa1770be631  DEGRADED     0     0   288  too many errors
            gptid/11bac542-ad95-11ed-8d1c-7df9cea98351  DEGRADED     0     0   284  too many errors
            gptid/11c0215d-ad95-11ed-8d1c-7df9cea98351  DEGRADED     0     0   280  too many errors
        logs
          gptid/111fa2ca-ad95-11ed-8d1c-7df9cea98351    ONLINE       0     0     0
        cache
          gptid/111d9ded-ad95-11ed-8d1c-7df9cea98351    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        Tank/Data/windowsfiles:<0x1>

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:04 with 0 errors on Tue Jun 20 03:45:04 2023
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors
root@truenas[~]#
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
So redid the scrub, same result, I deleted all the snapshots, and redid it again here's the latest zpool status -v
Did you run zpool clear Tank before one of those steps?
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Did you run zpool clear Tank before one of those steps?

I didn't forget the first time, But I think I did after I deleted the snapshots and ran the scrub again. I just cleared it and well see what this scrub turns up. Thanks for reminding me
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Here's the latest:
Code:
root@truenas[~]# zpool status -v
  pool: Tank
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 01:13:19 with 72 errors on Tue Jun 20 23:27:51 2023
config:

        NAME                                            STATE     READ WRITE CKSUM
        Tank                                            ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/e0b02732-0f6c-11ee-afa2-afa1770be631  ONLINE       0     0   144
            gptid/11bac542-ad95-11ed-8d1c-7df9cea98351  ONLINE       0     0   142
            gptid/11c0215d-ad95-11ed-8d1c-7df9cea98351  ONLINE       0     0   140
        logs
          gptid/111fa2ca-ad95-11ed-8d1c-7df9cea98351    ONLINE       0     0     0
        cache
          gptid/111d9ded-ad95-11ed-8d1c-7df9cea98351    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        Tank/Data/windowsfiles:<0x1>

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:04 with 0 errors on Tue Jun 20 03:45:04 2023
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors
root@truenas[~]#

In the ui it still says unhealthy.
 
Joined
Oct 22, 2019
Messages
3,641
In the ui it still says unhealthy.
The GUI is probably parsing the output of zpool status -v, and if it hits any instance of the "Permanent errors" or a non-zero number in the READ/WRITE/CKUM column, it interprets this as "UNHEALTHY". (Which is what you see on the GUI's Dashboard or Pools page.)

Just my guess.


Not sure what the explanation is for seeing ~140 checksum errors for all 3 drives.

The permanent error for Tank/Data/windowsfiles:<0x1> suggests metadata corruption for the dataset itself. (Not any specific file.)
 
Top