HELP ZFS Pool data recovery

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
The reason I'm recommending the additional drives is out of an abundance of caution, the fact that it isn't my data at risk, and that your most recent backup is "rather aged" by your own admission. I'm hoping that we can find a way to get you back to at least a point more recent than that.

Before performing any of the clone operations, record serial numbers of the disks and confirm them in the cloning software UI wherever possible.

On bare-metal, if I wanted a simple UI, I'd use something like Clonezilla (possibly combined with a physical write blocker) or a liveCD using dd - since you have Proxmox as a host OS which is Debian-based, you could use dd from Proxmox, but again, be absolutely sure you are specifying the correct source and destination disks. If you DD an entire empty disk onto a good one, there's no walking that back.

Partition table editing is likely going to be a manual effort here, where we gpart list the table from one disk and then manually create it on the other, ditto the label edits. If zdb finds labels on Disk A that's a good sign though.

For data recovery a key principle is "don't ever write back to media you're attempting to recover from" so restoring the old zpool.cache file would be best done if you can restore it to a separate location (network drive) or even if you can use a separate boot device (or Proxmox VM?) to make a fresh TrueNAS install and restore the zpool.cache file to it. (Disconnect your data disks when you're doing it though.)

Regarding the differences, you've already hit the point of using the -FX switch so we're at the point of going manually spelunking for older transaction groups with zdb and then trying to import the pool at that time using -T txg.
Awesome, I will use dd as that seems to be the best one for the job. I will triple-check everything! You will probably have to walk me through the partition table editing process a little more once I get to that point. I'll get another drive ordered right now. As for handling the zpool.cache file I have a fresh TrueNAS on a usb that I can plug into my main PC along with the drives and run it on bare metal.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
This will be the new plan.

New Drive arrived:
Drive A - wiped
Drive B - good <-- "failed drive" Outdated information
Drive C - good
Drive D - New Drive
Drive E - New Drive

Method 1:

Drive A - wiped --cloned--> Drive D -->rebuild partition table/gptid labels on Drive A.
Drive B - Resilver *if all goes well.
Drive C - good --cloned--> Drive E
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
If you come up with any other ideas or know of anyone who might have more ideas please feel free to post them here.
As a last resort before reverting to the backup from last year, you may give Klenet ZFS Recovery a try. The free version should identify those files which are potentially recoverable from the damaged pool. However an actual recovery operation would require coughing up the licence.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Awesome, I will use dd as that seems to be the best one for the job. I will triple-check everything! You will probably have to walk me through the partition table editing process a little more once I get to that point. I'll get another drive ordered right now. As for handling the zpool.cache file I have a fresh TrueNAS on a usb that I can plug into my main PC along with the drives and run it on bare metal.
Let's start with the zdb commands, looking at what you've got:

Code:
zdb -l /dev/ada0p2
zdb -l /dev/ada1p2
zdb -l /dev/ada2p2
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Let's start with the zdb commands, looking at what you've got:

Code:
zdb -l /dev/ada0p2
zdb -l /dev/ada1p2
zdb -l /dev/ada2p2
When doing "ada0p2" it says no such file or directory.

Code:
root@truenas[~]# zdb -l /dev/ada0
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3
root@truenas[~]# zdb -l /dev/ada1
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3
root@truenas[~]# zdb -l /dev/ada2
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3
root@truenas[~]#
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
When doing "ada1p2" it says no such file or directory.

Try both p1 and p2 on each drive anyways.

If the partition structure is only missing on one disk, we can figure out how to rebuild it. It would suck to bail on this just because one drive was missing a label.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Try both p1 and p2 on each drive anyways.

Also if you've added any new drives, be aware that FreeBSD device enumeration may not number them in the order you expect, so try "ada3" and "ada4" as well.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
EDIT: @jgreco, jinx! You owe me a soda. :tongue:

Send me your shipping address. You will shortly receive a large box containing a two liter bottle. Ignore the beeping UPS and the shaking of the paint mixing machine that the bottle is packed inside. I assure you it is completely safe to open the bottle and enjoy your soda. Grinch style.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
ada0 - no luck. - This is the wiped drive.


Code:
root@truenas[~]# zdb -l /dev/ada1p1
failed to unpack label 0
failed to unpack label 1
failed to unpack label 2
failed to unpack label 3
root@truenas[~]# zdb -l /dev/ada1p2
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'Tank'
    state: 0
    txg: 2025670
    pool_guid: 2717787786726095806
    errata: 0
    hostid: 1361597103
    hostname: ''
    top_guid: 12486228298157547035
    guid: 3459701388371009720
    vdev_children: 2
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 12486228298157547035
        nparity: 1
        metaslab_array: 74
        metaslab_shift: 34
        ashift: 12
        asize: 11995904212992
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 2470301540868142256
            path: '/dev/gptid/11b39573-ad95-11ed-8d1c-7df9cea98351'
            DTL: 48182
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 3459701388371009720
            path: '/dev/gptid/11bac542-ad95-11ed-8d1c-7df9cea98351'
            DTL: 48181
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 16273966696595496550
            path: '/dev/gptid/11c0215d-ad95-11ed-8d1c-7df9cea98351'
            DTL: 48180
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3
root@truenas[~]#


now we're getting somewhere

Code:
root@truenas[~]# zdb -l /dev/ada2p2
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'Tank'
    state: 0
    txg: 2083363
    pool_guid: 2717787786726095806
    errata: 0
    hostid: 1361597103
    hostname: ''
    top_guid: 12486228298157547035
    guid: 16273966696595496550
    vdev_children: 2
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 12486228298157547035
        nparity: 1
        metaslab_array: 74
        metaslab_shift: 34
        ashift: 12
        asize: 11995904212992
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 2470301540868142256
            path: '/dev/gptid/11b39573-ad95-11ed-8d1c-7df9cea98351'
            DTL: 48182
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 3459701388371009720
            path: '/dev/gptid/11bac542-ad95-11ed-8d1c-7df9cea98351'
            not_present: 1
            DTL: 48181
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 16273966696595496550
            path: '/dev/gptid/11c0215d-ad95-11ed-8d1c-7df9cea98351'
            DTL: 48180
            create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3
root@truenas[~]#


Now that ^ should be all for the 4tb drives. the rest below will be logs:

Code:
root@truenas[~]# zdb -l /dev/ada3p1
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    state: 4
    guid: 8292917934443105375
    labels = 0 1 2 3
------------------------------------
L2ARC device header
------------------------------------
    magic: 6504978260106102853
    version: 1
    pool_guid: 2717787786726095806
    flags: 0
    start_lbps[0]: 20566878720
    start_lbps[1]: 20538562560
    log_blk_ent: 1022
    start: 4194816
    end: 34358951936
    evict: 20590295040
    lb_asize_refcount: 66048
    lb_count_refcount: 5
    trim_action_time: 0
    trim_state: 0

------------------------------------
L2ARC device log blocks
------------------------------------
log_blk_count:   1798 with valid cksum
                 0 with invalid cksum
log_blk_asize:   22749184

root@truenas[~]#


Code:
root@truenas[~]# zdb -l /dev/ada4p1
------------------------------------
LABEL 0
------------------------------------
    version: 5000
    name: 'Tank'
    state: 0
    txg: 2083363
    pool_guid: 2717787786726095806
    errata: 0
    hostid: 1361597103
    hostname: ''
    top_guid: 13912610063312002762
    guid: 13912610063312002762
    is_log: 1
    vdev_children: 2
    vdev_tree:
        type: 'disk'
        id: 1
        guid: 13912610063312002762
        path: '/dev/gptid/111fa2ca-ad95-11ed-8d1c-7df9cea98351'
        metaslab_array: 73
        metaslab_shift: 29
        ashift: 12
        asize: 34354757632
        is_log: 1
        DTL: 48179
        create_txg: 4
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
    labels = 0 1 2 3
root@truenas[~]#
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
ada0 - no luck. - This is the wiped drive.

Ok. So, let's wait for everyone to catch up here and offer opinions.

My feeling is that we need to see if we can re-establish a partitioning scheme on ada0. If you are ordering spare drives, we need to wait for those, and then make a copy of ada0 onto one of them. You then unplug ada0 and set it aside while we hack on the new disk's label.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Ok. So, let's wait for everyone to catch up here and offer opinions.

My feeling is that we need to see if we can re-establish a partitioning scheme on ada0. If you are ordering spare drives, we need to wait for those, and then make a copy of ada0 onto one of them. You then unplug ada0 and set it aside while we hack on the new disk's label.
Sounds good. I can start the cloning process of ada0 to the blank drive that I currently have right now.

Once I get the OK I'll start the cloning process. (I'll be very careful :) )

Because I'll have the computer out, I'll also document all the S/N on the drives and post it here for safe keeping.
 
Joined
Oct 22, 2019
Messages
3,641
Not to go on a tangent, but for the sake of caution and sanity:

Before all of this started, why did you originally believe Drive B was failed/failing? (Did it spit out any errors? Were you alerted via the GUI or an email?)

Part of this whole process should minimize the risk of re-introducing a "potentially" failing drive.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Ok. So, let's wait for everyone to catch up here and offer opinions.
Catching up. Stand by for effortpost.

@Dawson the good news is that the reason I've been AFK here is that I've been simulating this in my lab, and I've been able to recover from a Linux-issued wipefs against a RAIDZ1 with one drive pulled. It's messy, but it came back intact.
 
Joined
Oct 22, 2019
Messages
3,641
Did you do this against one of the drives of a currently imported and active pool? (Or did you issue wipefs against said drive, while the pool was exported? I.e, the drives were available to the system, but no chance of ZFS-related I/O to any of the drives.)

EDIT: Typical forum confusion. This reply was directed at @HoneyBadger, not @Dawson :wink:
 
Last edited:

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Not to go on a tangent, but for the sake of caution and sanity:

Before all of this started, why did you originally believe Drive B was failed/failing? (Did it spit out any errors? Were you alerted via the GUI or an email?)

Part of this whole process should minimize the risk of re-introducing a "potentially" failing drive.

The drive showed no issues and suddenly disappeared. I found the reason, it was due to a loose psu cable. The drive is totally fine as far as I know and SMART knows. When I wiped it, it was an imported and active pool. I wiped it with the TrueNAS vm shutdown. Then I booted the vm back up and found out I wiped the wrong drive.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Catching up. Stand by for effortpost.

@Dawson the good news is that the reason I've been AFK here is that I've been simulating this in my lab, and I've been able to recover from a Linux-issued wipefs against a RAIDZ1 with one drive pulled. It's messy, but it came back intact.

That is so good to hear. Thank you from the bottom of my heart. I won't do anything until I receive instructions from you. Since I feel like you'd know best thanks to the lab testing.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Assumptions: All of your three drives are identical models. If we need to play with the partition sizes I'll need to do more.

In the examples below:

ada0 is "Drive A" that got the Proxmox wipe.
ada1 is "Drive B" that we thought failed, but it was just loose cabling and it's now recovered.
ada2 is "Drive C" that's the last disk standing.

You can see from ada1p2 that the last transaction group committed was 2025670 and ada2p2 is 2083363 - assuming you basically hovered around the 5-second default txg timeout period that's around 80 hours.

Confirm with serial numbers that the order of these drives hasn't changed.
So, step zero is underway - clone Drive A to Clone A.

Assuming order has been maintained ada0 should have label 11b39573-ad95-11ed-8d1c-7df9cea98351

So, here's what we're going to do.

Finish your clone of Drive A to Clone A. Pull the original Drive A, set it aside, and replace it with Clone A. Get it presented back to the system in the exact same way. Instructions are in the spoiler.

Good. Buckle up.

Again, confirm with serial numbers that the order of these drives hasn't changed. We don't want to target the wrong drives.

Check the partition table on Drive C with
gpart backup ada2
If it looks good, and has output like below
Code:
GPT 128
1   freebsd-swap      128  4194304
2    freebsd-zfs  4194432 12582744

If it looks similar to the above (but with a way bigger number at the end) then:

Clone the partition table from Drive C to Clone A with
gpart backup ada2 | gpart restore ada0

Check the partition table on Clone A with
gpart backup ada0
It should be identical (same model drives, same partition layout)

See if you get an output from zdb -l ada0p2 now. If you do, then this is a good thing - check the txg number near the top. Hopefully it's closer to ada2p2's 2083363 than the older ada1p2 number.

Rewrite the missing GPTID of 11b39573-ad95-11ed-8d1c-7df9cea98351 to Clone A with
gpart modify -i2 -l 11b39573-ad95-11ed-8d1c-7df9cea98351 ada0

Reboot. Go back to the command line and check the results of zpool import which will hopefully give you the pool available for import:

Code:
root@freenas-lab[~]# zpool import
   pool: recoverme
     id: 9933807979428463458
  state: DEGRADED
status: One or more devices are missing from the system.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q
 config:


You'll probably need to do zpool import -F or -FX as well.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Did you do this against one of the drives of a currently imported and active pool? (Or did you issue wipefs against said drive, while the pool was exported? I.e, the drives were available to the system, but no chance of ZFS-related I/O to any of the drives.)

EDIT: Typical forum confusion. This reply was directed at @HoneyBadger, not @Dawson :wink:

Clobbered them live with wipefs -af from a separate SCSI initiator. TrueNAS CORE machine wouldn't export the pool, required me to force a shutdown, reboot, and then go about the label rebuild described above.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Assumptions: All of your three drives are identical models. If we need to play with the partition sizes I'll need to do more.

In the examples below:

ada0 is "Drive A" that got the Proxmox wipe.
ada1 is "Drive B" that we thought failed, but it was just loose cabling and it's now recovered.
ada2 is "Drive C" that's the last disk standing.

You can see from ada1p2 that the last transaction group committed was 2025670 and ada2p2 is 2083363 - assuming you basically hovered around the 5-second default txg timeout period that's around 80 hours.

Confirm with serial numbers that the order of these drives hasn't changed.
So, step zero is underway - clone Drive A to Clone A.

Assuming order has been maintained ada0 should have label 11b39573-ad95-11ed-8d1c-7df9cea98351

So, here's what we're going to do.

Finish your clone of Drive A to Clone A. Pull the original Drive A, set it aside, and replace it with Clone A. Get it presented back to the system in the exact same way. Instructions are in the spoiler.

Good. Buckle up.

Again, confirm with serial numbers that the order of these drives hasn't changed. We don't want to target the wrong drives.

Check the partition table on Drive C with
gpart backup ada2
If it looks good, and has output like below
Code:
GPT 128
1   freebsd-swap      128  4194304
2    freebsd-zfs  4194432 12582744

If it looks similar to the above (but with a way bigger number at the end) then:

Clone the partition table from Drive C to Clone A with
gpart backup ada2 | gpart restore ada0

Check the partition table on Clone A with
gpart backup ada0
It should be identical (same model drives, same partition layout)

See if you get an output from zdb -l ada0p2 now. If you do, then this is a good thing - check the txg number near the top. Hopefully it's closer to ada2p2's 2083363 than the older ada1p2 number.

Rewrite the missing GPTID of 11b39573-ad95-11ed-8d1c-7df9cea98351 to Clone A with
gpart modify -i2 11b39573-ad95-11ed-8d1c-7df9cea98351 ada0

Reboot. Go back to the command line and check the results of zpool import which will hopefully give you the pool available for import:

Code:
root@freenas-lab[~]# zpool import
   pool: recoverme
     id: 9933807979428463458
  state: DEGRADED
status: One or more devices are missing from the system.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-2Q
 config:


You'll probably need to do zpool import -F or -FX as well.
Will do. They're all the same Toshiba MG03ACA400 4tb drives. I will start cloning then!
 
Top